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Preface 


A  precise  definition  of  complexity  itself  has  been  an 
ongoing  debate,  and  there  are  many  definitions;  this 
report  briefly  compares  and  contrasts  the  defini¬ 
tions.  It  then  settles  upon  algorithmic  complexity, 
specifically,  Kolmogorov  Complexity,  asa  working 
definition  and  proceedsto  explore  the  unique  para¬ 
digm  shift  that  appears  when  communication  net¬ 
work  fault  tolerance  is  viewed  through  the  lens  of 
Kolmogorov  Complexity.  Our  application  of  com¬ 
plexity  to  communication  networking  isdirected 
toward  both  the  current  and  next-generation  Inter¬ 
net.  Many  emerging  communication  network  tech¬ 
nologies  are  discussed  in  this  report  such  asself¬ 
healing  networks,  intelligent  and  predictive  net¬ 
works,  active  networks,  predictive  network  manage¬ 
ment,  and  information  assurance  and  network 
security  technology.  However  a  common  thread, 


namely,  the  role  of  complexity,  emerges  to  tie 
together  these  previously  disparate  technologies  in 
new  and  unique  ways.  In  addition,  the  application  to 
information  assurance  isan  extremelytimelytopic, 
given  Microsoft's  recently  announced  focus  on  fix¬ 
ing  their  product's  security  flaws.  Recent  tragic  ter¬ 
rorist  events  clearly  demonstrate  that  the  civilian 
and  military  internets  are  vulnerable  targets.  We 
hope  to  spark  new  ideas  in  multidisciplinary  fields; 
we  are  standing  on  the  shoulders  of  two  giants— the 
communication  networking  community  and  the 
founding  fathers  of  algorithmic  information  the¬ 
ory-while  attempting  to  make  the  work  of  both 
communitiesunderstandableto  each  other  as  well  as 
to  a  general  audience  in  the  hope  of  synthesizing 
new  ideas  in  the  mindsof  all  readers. 

—  Stephen  F.  Bush  February  03,  2002 
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Summary 


A  seminal  contribution  of  this  effort  was  pre¬ 
sented— the  development  of  a  complexity-based 
information  assurance  metric  for  vulnerability  analy¬ 
sis.  The  metric  proposed  isKolmogorovComplexity. 
Advancesin  computable estimatesof  Kolmogorov 
Complexity  are  indicated,  as  well  as  additional  appli¬ 
cations  of  KolmogorovComplexityfor  fault  tolerant 
communications  in  general.  Unless  vulnerabilities 
can  be  identified  and  measured,  the  information 
assurance  of  a  system  can  never  be  properly 
designed  or  guaranteed.  An  underlying  definition  of 
information  security  is  hypothesized  based  upon  the 
attacker  and  defender  as  reasoning  entities,  capable 
of  learning  to  outwit  one  another.  Estimatesof  Kol¬ 
mogorov  Complexity  provide  such  an  objective 
parameter  with  which  to  provide  information  assur¬ 
ance  through  anomaly  detection  and  objective 
model  development.  The  capability  of  this  metric  is 
limited  in  part  by  the  accuracy  of  its  estimation, 
which  must  be  traded  against  computational 
expense.  The  Optimal  Symbol  Compression  Ratio 
Algorithm,  used  to  estimate  complexity  and  sophisti¬ 
cation,  provides  additional  capability  to  discern 
anomalous  behavior  in  information  systems.  Further 
research  is  needed  to  develop  strategies  for  cost- 
effective  use  of  this  paradigm  across  entire  systems. 

The  desirable  properties  of  a  metric  for  security 
are  examined  (Section  3.3).  In  order  to  further  the 
development  of  a  realistic  metric,  a  general  model 
for  studying  information  assurance  is  proposed  (Sec¬ 
tion  4).  Next,  a  definition  of  vulnerability  is  pro¬ 
posed  in  termsof  a  new  model  based  on  Turing 
Machines(Flypothesis4.1),  and  engineered  proper¬ 
ties  of  information  assurance  with  an  analogy  to 
mechanical  engineering  are  proposed  in  termsof 
the  new  model.  The  analogy  with  mechanical  engi¬ 
neering  is  called  Brittle  Systems  ( Section  5)  and 
involves  the  design  of  information  assurance  in  a 
manner  that  accounts  for  tradeoffs  in  performance 
and  degradation  of  information  assurance  in  a  sys¬ 
tem.  Information  assurance  is  also  examined  from 
the  perspective  of  set  theory  and  a  topological  space 
(Section  3.5).  This  is  particularly  relevant  towards 
understanding  the  operation  of  the  metric  with 
regard  to  secure  composition  and  the  inherent  lim¬ 
its  of  applying  safeguards  to  a  system. 

The  advantages  and  drawbacks  of  Kolmogorov 
Complexity  are  discussed,  including  itsincomput- 


able  nature.  FI  owever,  computable  estimates  (Sec¬ 
tion  6.2)  of  Kolmogorov  Complexity  are  explained, 
as  well  as  additional  useful  applications  of  Kolmog¬ 
orovComplexityfor  communicationsin  general. 
These  additional  applicationsare  important  because 
they  demonstrate  how  information  assurance  is  an 
integral  part  of  information  system  design.  Next 
Theorems 6.1  and  6.2  concerning  the  conservation 
of  complexity  (Section  6.7)  within  an  information 
system  were  discussed.  This  led  to  a  Swarm  experi¬ 
ment  that  monitors  the  evolution  of  complexity  in  a 
dynamic  and  complex  system  and  examines  our  abil¬ 
ity  to  monitor  the  complexity  as  it  evolves.  U  nless 
vulnerabilities  can  be  identified  and  measured,  the 
information  assurance  of  a  system  can  never  be 
properly  designed  or  guaranteed.  Results  from  a 
study  on  complexity  evolving  within  an  information 
system  using  Mathematica  (Section  9.2),  Swarm,  and 
a  new  Java  complexity  probe  toolkit  (Section  9.4) , 
developed  by  this  project,  were  presented  in  this 
report.  An  underlying  definition  of  information 
security  was  hypothesized  ( H  ypothesis  9.1)  based 
upon  the  attacker  and  defender  as  reasoning  enti¬ 
ties,  capable  of  learning  to  outwit  one  another.  This 
leads  to  a  study  of  the  evolution  of  complexity  in  an 
information  system  and  the  effectsof  the  environ¬ 
ment  upon  the  evolution  of  complexity.  U  nder- 
standing  the  evolution  of  complexity  in  a  system 
enables  a  better  understanding  of  howto  measure 
and  quantify  the  vulnerability  of  a  system.  Finally, 
the  design  of  thejava  complexity  probes  toolkit 
under  construction  for  automated  measurement  of 
information  assurance  is  presented  (Section  9.5).  A 
dialog  is  included  that  contains  typical  questions 
about  the  relationship  between  complexity  and 
information  assurance.  This  dialog  is  best  read  after 
reading  the  introduction  Kolmogorov  Complexity 
(Section  6.1)  or  for  someone  alreadyfamiliar  with 
complexity  theory  who  wants  a  quick  overview  of  the 
approach  taken  on  this  project  toward  the  relation¬ 
ship  between  complexity  and  information  assur¬ 
ance. 

Another  result  of  this  project  isthe  concept  of 
conservation  in  the  evolution  of  complexity  in  a  sys¬ 
tem  and  the  search  for  a  bound  on  the  change  in 
complexity  such  that  abnormal  behavior  can  be 
detected  when  the  bound  is  exceeded.  This  work 
demonstrated  a  promising  approach  for  further 
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exploration  into  the  laws  governing  complexity  and 
the  evolution  of  complexity  within  a  system  using 
simulation.  Finally,  complexity  probes  were  devel¬ 
oped  to  enhance  a  security-engineering  tool  based 
upon  an  electrical  engineering  paradigm  with  com¬ 
plexity  as  the  resistance  to  insecurity  flow. 

Blindly  applying  current  communication  and 
computation  technologyon  M  EM  S  devices  would  be 
fighting  a  losing  battle  against  nature.  The  proposi¬ 
tion  this  report  hoped  to  reinforce  in  the  readers' 
mind  was  that  M  EM  S  devicescan  be  more  efficiently 
engineered  by  working  with,  instead  of  against,  the 
environment  in  which  they  are  placed.  Specifically, 
two  approaches  were  proposed  for  revolutionary 
gains  in  M  EM  S  device  communication.  The  first  was 
to  view  all  network  devices  as  computational  or 
active  devices.  Computation  can  take  manyforms. 
The  amount  of  computation  may  vary,  but  every 
device  has  some  type  of  computation,  either  pro¬ 
grammed  or  ambient.  Use  of  computation  in  an 
optimal  manner  isthe  same  challenge  faced  by 
active  networks.  Thus,  advances  in  active  networks 
and  networks  of  MEM  S  devices  are  mutually  benefi¬ 
cial.  The  second  approach  was  to  optimize  networks 


of  MEM  S  devices  via  exploiting  emergence.  Under¬ 
standing  emergence  requires  understanding  com¬ 
plexity;  that  relationship  was  touched  upon  relative 
to  networking  in  this  report.  Use  of  emergence 
shows  promise  as  a  meansto  precisely  engineer 
d esi  red  ch  aracter i  sti cs  i  n  to  systems  of  M  E  M  S  devi ces 
resulting  in  reduced  size  by  removing  unnecessary 
computation  and  control. 

This  report  shows  that  a  genetic  algorithm  shows 
sudden  decreases  in  complexity  of  the  population 
between  generationsas the  algorithm  evolves  in 
response  to  the  fitness  function.  Lower  complexity 
correspondsto  greater  homogeneity  in  the  popula¬ 
tion  and  greater  fitness  to  the  chosen  criterion.  Thus 
it  can  be  clearly  seen  that  complexity  can  be  used  as 
one  indicator  of  progress  in  evolution  of  the  genetic 
algorithm.  A  framework  for  testing  the  injection  of 
fitness  functions  into  an  active  network  that  evolves 
solutionsvia  a  genetic  programming  technique  has 
been  implemented.  Future  work  involves  testing  the 
response  time  to  heal  and  the  resiliency  of  the  net¬ 
work  in  the  presence  of  faults. 
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Complexity  is  being  studied  in  a  myriad  of  disci¬ 
plines  and  in  many  different  ways.  Measures  of  com¬ 
plexity  have  been  derived  in  attempt  to  understand 
complexity,  but  they  have  disadvantages,  either  they 
do  not  fully  capture  the  nature  of  complexity,  or 
they  are  fundamentally  incomputable. 

INTRODUCTION  TO  COM  PLEXITY 

The  word  complexity  comes  to  usfrom  the  Latin 
word  complexus,  meaning  'in  totality1  or  'a  whole  set 
consisting  of  many  interconnecting  parts.'  The  defi¬ 
nition  of  the  word  includes  the  connotation  'diffi- 
cultto  understand'.  The  Universe,  as  well  as  every 
object  in  the  U  niverse,  consists  of  many  intercon¬ 
nected  parts.  H  umans  have  attempted  to  reduce  the 
apparent  complexity  of  nature,  in  other  wordsto 
understand  the  U  niverse,  by  observing  particular 
in  stances  of  the  operation  of  subsets  of  intercon¬ 
nected,  or  interacting,  parts.  Hypotheses  are  gener¬ 
ated,  experiments  to  test  those  hypotheses  are 
developed,  and  the  outcome  of  the  experiments 
either  reinforces,  or  counters  the  hypotheses.  Varia¬ 
tions  on  the  original  hypotheses,  or  entirely  new 
hypotheses  are  generated  and  tested  and  the  cycle 
continues.  Science  has  progressed  in  this  manner  in 
search  of  the  essence,  or  most  general  underlying 
explanation  for  as  many  phenomena  as  possible.  As 
discussed  later  in  this  report,  the  very  critical  act  of 
hypothesis  formation  and  testing,  the  Scientific 
Method,  the  foundation  of  science  upon  which  Man 
dependsfor  advances  in  every  aspect  of  civilization, 
is  itself  governed  and  characterized  by  complexity. 

H  owfar  can  the  Scientific  Method,  described 
above,  take  us  in  understanding  Complexity  The¬ 
ory?  When  one  studies  complexity  as  a  science,  the 
focus  becomes  a  simplified  understanding  of  large 
numbers  of  interactions.  Subtle,  yet  insidious  prob¬ 
lems  render  the  study  of  complexity  a  challenging 
problem.  The  particular  details  of  individual  parts, 
important  in  specifying  interactions,  are  less  impor¬ 
tant  to  the  understanding  of  complexity  than  the 
interactions.  An  exception  is  when  the  parts  them¬ 
selves  consist  of  many  interacting  parts.  This  implies 
the  existence  of  layers  of  complexity.  How  can  one 
obtain  a  perfectly  closed  subset  of  the  Universe  in 
which  to  test  hypotheses  concerning  complexity? 
The  mere  act  of  measuring  any  characteristic  of  such 
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a  system  violates  closure.  How  can  one  be  certain 
thatthereareno  interactions  at  some  unknown  level 
with  in  the  supposedly  closed  system  and  the  rest  of 
the  U  niverse?  Is  it  possible  for  one  system  to  mea¬ 
sure  the  complexity  of  another  more  complexity  sys¬ 
tem?  These  are  some  of  the  questions  that  we  will 
explore  in  this  report.  Godel  has  demonstrated  that 
a  system  cannot  completely  describe  itself  with  per¬ 
fect  fidelity.  Berryman's  Paradox  suggests  that  any 
algorithm  capable  of  computing  complexity  must  be 
at  least  as  complex  as  the  object  whose  complexity  is 
being  measured.  H  ow  can  one  measure  the  com¬ 
plexity  of  an  algorithm  that  measures  complexity 
without  requiring  a  more  complex  algorithm?  Mea¬ 
suring  the  complexity  of  the  more  complex  algo¬ 
rithm  requires  an  even  more  complex  algorithm,  ad 
infinitum. 

While  studying  ComplexityTheory,  researchers 
focused  on  the  science  of  interactionsof  large  num- 
bersof  parts,  have  noticed  that  something  amazing 
happens  under  certain  conditions.  Unexpectedly 
complex  results,  based  upon  simple  interactions  can 
occur.  This  is  known  as  emergence.  Detecting  and 
controlling  emergence  could  also  lead  to  ground¬ 
breaking  results.  The  implications  are  that  program¬ 
ming  simple  interactionswhile  letting  emergent 
behavior  handle  the  bulk  of  the  work  in  a  robust 
manner  could  control  the  desired  characteristics  of 
a  system.  In  other  words,  complexity  theory,  specifi¬ 
cally  through  emergence,  could  provide  a  new  and 
much  more  efficient  and  robust  form  of  control. 
Ultimately,  complexity  theory  and  emergence  could 
progress  to  self-organizing  systems.  These  are  sys¬ 
tems  whose  natural  tendency  isto  align  in  a  form 
optimal  to  the  task  required.  An  example  is  a  self- 
healing  system,  that  is,  a  system  that  inherently  re¬ 
forms  to  mitigate  a  fault. 

MEASURES  OF  COM  PLEXITY 

There  have  been  many  attempts  to  define  and  mea¬ 
sure  complexity.  Attempts  to  define  the  complexity 
of  a  system  might  be  broadly  described  as  attempts 
to  remove  portions  or  patterns,  of  the  system  that 
are  simple,  leaving  behind  the  portionsthat  are 
complex.  The  size  of  the  remaining  portionsof  the 
system  must  then  contain  the  complexity  of  the  sys- 
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Figure  1.  Definitions  and  Measurements  of  Complexity. 


tern.  I  n  Figure  1  an  attempt  is  made  to  cluster 
selected  known  complexity  measures  into  catego¬ 
ries.  The  purpose  of  thisfigure  isto  show  the  great 
varietyof  complexity  techniques  and  to  be  able  dis¬ 
cuss  broad  classesof  techniques.  Except  for  a  few 
specific  exceptions,  detailsof  each  and  every  tech¬ 
nique  will  not  be  discussed.  The  categories  in  this 
classification  describe  the  complexity  estimation 
technique  and  thus  suitability  to  typesof  systems. 
The  arrows  indicate  subclasses  of  complexity  estima¬ 
tion  techniques.  The  highest-level  classifications  are 
Static  and  Dynamic  techniques.  Static  techniques 
assume  the  system  whose  complexity  isto  be  esti¬ 
mated  doesnot  vary  with  time  as  the  estimation  cal¬ 
culation  executes.  A  snapshot  of  a  Dynamic  system  is 
also  considered  a  static  system.  Dynamic  techniques 
allow  the  system  to  change  with  time;  in  fact  some 
require  that  the  system  under  analysis  change  with 
time. 

The  Static  complexity  estimation  techniques  can 
be  further  sub-classified  into  Algorithmic,  Indepen¬ 
dent  Descriptions,  and  probabilistic  categories. 
Algorithmic  techniques  attempt  to  use  an  algorithm 


as  a  fundamental  description  of  the  complexity  of  a 
static  snapshot  of  a  system.  I  ndependent  Description 
techniques  attempt  to  count  the  number  of  irreduc¬ 
ible  components  needed  represent  a  system,  such  as 
number  of  dimensions,  number  of  independent 
models,  or  number  of  irreducible  components. 
Probabilistic  techniques  generally  assume  that  low 
complexity  components  are  more  likely  than  higher 
complexity  components.  Probabilistic  techniques 
also  assume  attempt  to  use  probability  to  determine 
independence  of  sub  components  of  a  system  allow¬ 
ing  the  system  to  be  partitioned  into  independent 
components.  In  this  work,  Algorithmic  techniques 
are  most  relevant  because  of  the  nature  of  computa¬ 
tion  in  the  form  of  executable  algorithms  and  its 
relation  to  the  transmission  of  information  in  the 
form  of  static  data.  Algorithmic  techniques  are  fur¬ 
ther  sub  classified  into  Propositional  Logic-based 
and  Automata-based  techniques.  Algorithmic  tech¬ 
niques  also  include  minimum  description  methods 
that  seek  to  estimate  complexity  by  determining  the 
smallest  size  to  which  a  description  can  be  com¬ 
pacted.  Complex  descriptions,  containing  a  larger 
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number  of  "random”  interactions  and  lacking  any 
form  of  repeatable  patterns,  cannot  be  described  as 
compactly  as  simple  descriptions.  Kol mogorov  Com¬ 
plexity  is  a  complexity  measure  that  falls  within  this 
category. 

Vladimir  Gudkov  has  been  exploring  the  mini¬ 
mum  number  of  dimensions  required  to  character¬ 
ize  network  information  flow [163- 166],  Vladimir's 
work  could  be  categorized  in  the  Dynamic,  Self- 
Organizing,  Evolutionary,  Chaotic  group  of  com¬ 
plexity  theory  techniques.  Vladimir's  approaches 
the  development  of  network  behavior  description  in 
terms  of  numerical  time-dependant  functions  of 
protocol  parameters.  This  provides  a  basisfor  appli¬ 
cation  of  methods  of  mathematical  and  theoretical 
physics  for  information  flowanalysison  network  and 
for  extraction  of  patterns  of  typical  network  behav¬ 
ior.  The  information  traffic  can  be  described  asa  tra¬ 
jectory  in  multi-dimensional  parameter-time  space 
with  dimension  about  10-12.  The  result  of  his  work 
could  help  to  improve  our  Kolmogorov  Complexity 
estimators. 

KolmogorovComplexity  is  a  measure  of  descrip¬ 
tive  complexity  contained  in  an  object.  It  refers  to 
the  minimum  length  of  a  program  such  that  a  uni¬ 
versal  computer  can  generate  a  specific  sequence.  A 
good  introduction  to  KolmogorovComplexity  is 
contained  in  [118]  with  a  solid  treatment  in  [10]. 
KolmogorovComplexity  is  related  to  Shannon 
entropy,  in  that  the  expected  value  of  K(x)  for  a  ran¬ 
dom  sequence  is  approximately  the  entropy  of  the 
source  distribution  for  the  process  generating  the 
sequence.  However,  KolmogorovComplexitydiffers 
from  entropy  in  that  it  relates  to  the  specific  string 
being  considered  rather  than  the  source  distribu¬ 
tion. 

The  major  difficulty  with  KolmogorovComplexity 
isthat  it  is  not  computable.  Any  program  that  pro- 
ducesa  given  string  isan  upper  bound  on  the  Kol¬ 
mogorovComplexity  for  this  string,  but  you  can't 
compute  the  lower  bound,  yet  as  will  be  discussed 
later  in  this  section,  estimates  have  shown  to  be  use¬ 
ful  in  providing  information  assurance  and  intru¬ 
sion  detection. 

KolmogorovComplexity  is  a  measure  of  descrip¬ 
tive  complexitythat  refers  to  the  minimum  length  of 
a  program  such  that  a  universal  computer  can  gen¬ 
erate  a  specific  sequence.  Universal  computers  can 
be  equated  through  programs  of  constant  length; 
thusa  mapping  can  be  made  between  universal  com¬ 
puters  of  different  types.  The  string  x  may  be  either 


data  or  the  description  of  a  process  in  an  actual  sys¬ 
tem.  Unless  other  wise  specified,  consider  x  to  be  the 
program  for  a  Turing  Machine  described  in 
Definition  1. 


V*)  = 


[  min  l(p 

1<p(p)  =  * 


(1) 


= 


min  l(p) 

(p  (p,x)  =  y 

oo,  if  there  is  no  p  such  that  cp  (p,x)  -  y 


(2) 


Conditional  Complexity,  described  in  Equation  2, 
quantifies  the  complexity  of  string  x,  given  string  y. 
Intuitively,  it  isthe  additional  complexity  of  string  x 
beyond  that  in  string  >  This  definition  of  Kolmog¬ 
orovComplexity  is  used  repeatedly  throughout  the 
remainder  of  this  report. 

KolmgorovComplexity  has  been  shown  to  pro¬ 
vide  a  useful  framework  from  which  to  study  objec¬ 
tive  metrics  and  methodologies  for  achieving 
information  assurance.  Recent  results  have  shown 
promise  for  complexity  estimators  to  detect  FTP 
exploitsand  DDoS  attacks.  Complexity isattractive as 
a  metric  for  information  assurance  because  it  is  an 
objective  means  of  characterizing  and  modeling 
data  and  information  processes  for  the  purpose  of 
benchmarking  normal  healthy  behavior,  identifying 
weaknesses,  and  detecting  deviationsand  anomalies. 

Since  exact  measurement  of  KolmogorovCom¬ 
plexity  is  not  computable,  estimators  are  required. 
The  accuracy  and  computational  requirements  of 
estimators  together  determine  the  capability  or 
practicality  of  use  for  a  given  application.  For  exam¬ 
ple,  the  very  crude  complexity  estimate  of  empirical 
entropy  carries  very  little  overhead,  but  is  suitable 
for  some  applications.  Other  applications  can  bene¬ 
fit  from  complexity  metrics  when  more  expensive 
estimation  algorithms  are  utilized,  but  the  computa¬ 
tional  expense  may  not  be  feasible. 

I  n  this  report  we  motivate  the  use  of  complexity 
metrics  for  information  assurance  by  discussing  sev¬ 
eral  applications  of  complexity  metrics  for  informa¬ 
tion  assurance,  each  of  which  dependsin  some  sense 
on  accurate  complexity  estimators.  We  then  discuss 
and  compare  several  ubiquitous  complexity  estima¬ 
tors,  their  accuracy,  and  computational  expense. 
Finally,  we  introduce  a  new  complexity  estimator 
and  benchmark  its  capability  against  others  for  the 
FTP  exploit  detection  application. 
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1.  M  ethods  and  Assumptions 


1.1  METHODS  AND  ASSUMPTIONS 

The  project  report  is  presented  in  the  form  of  three 
hypotheses  that  correspond  with  the  Imperishable 
Network  statement  of  work.  Following  each  hypothe¬ 
sis  is  a  list  of  accomplishments  relating  to  validation 
of  that  hypothesis. 

Hypothesis  1 

The  first  project  goal  is  to  explore  the  hypothesis 
that  information  comprised  of  observations  from  an 
event,  such  asa  fault  or  attack,  which  isgenerated  by 
a  single  root  cause,  is  highly  correlated.  H  ighly  cor¬ 
related  data  has  a  low  complexity  and  a  high  com¬ 
pression  ratio.  However,  this  project  is  not  focusing 
on  legacy  static  compression  algorithms,  but  rather 
algorithmic  compression.  Algorithmic  compression 
involves  code  that  can  dynamically  change,  and 
when  executed,  regenerates  the  intended  data.  The 
compression  code  is  designed  to  be  a  hypothesis 
about  the  data  to  be  compressed.  The  more  accurate 
the  hypothesis,  the  more  efficient  the  compression. 
Algorithmic  compression  and  prediction  are  tightly 
linked;  if  a  program  can  predict  data  (generating 
more  data  than  the  program's  size),  then  it  is,  by  def¬ 
inition,  an  algorithmically  com  pressed  form  of  the 
data.  Active  networks  form  an  ideal  vehicle  for  trans¬ 
mitting  thisform  of  fault  information  because  they 
facilitate  the  transmission  of  code  within  the  net¬ 
work.  The  most  highly  compressed,  and  thus  most 
likely,  fault  representations  are  transmitted  faster 
and  farther  due  to  their  smaller  size.  Kolmogorov 
Complexity,  K(x),  measures  the  size  of  the  smallest 
program  capable  of  representing  a  particular  piece 
of  data,  thus  providing  guidance  as  to  the  optimal 
amount  of  information  within  an  active  packet  to  be 
in  the  form  of  code  versus  data.  Specific  accomplish¬ 
ments  towards  this  goal  are: 

1.  A  new  algorithm  that  incorporates  both  Kol¬ 
mogorov  Complexity  and  entropy,  facilitating 
our  study  of  the  relationship  between  them,  has 
been  devised  to  both  estimate  complexity  and 
perform  compression. 

2.  A  DDoS  attack  detection  algorithm,  based  upon 
our  hypotheses  regarding  complexity  theory, 
has  been  implemented.  Testing  isunderway 
within  our  Active  Network  testbed.  The  algo¬ 
rithm  makes  use  of  a  fundamental  theorem  of 
Kolmogorov  Complexity  we  derived  that  states: 


For  anytwo  stringsxand  Y,  K(X,Y)  <=K(X)  + 
K(Y),  where  K(X)  and  K(Y)  are  the  complexities 
of  the  respective  strings  and  K(X,Y)  isthejoint 
complexity  of  the  two  strings.  Stated  more  sim¬ 
ply  the  joint  Kolmogorov  complexity  of  two 
strings  is  less  than  or  equal  to  the  sum  of  the 
complexities  of  the  individual  strings.  In  other 
words,  the  joint  complexity  of  the  string  (data 
stream)  decreases  as  the  correlation  within  the 
string  increases.  This  property  is  exploited  to 
distinguish  between  concerted  denial-of-service 
attacks  and  cases  of  traffic  overload.  The 
assumption  isthatan  attacker  performs  an 
attack  using  large  numbers  of  correlated  packets 
generated  from  different  locationsbut  intended 
for  the  same  destination.  Thus,  there  isa  lot  of 
similarity  in  the  traffic  pattern.  A  Kolmogorov 
complexity  based  detection  algorithm  can 
quickly  identify  such  patterns.  On  the  other 
hand,  a  case  of  traffic  overload  in  the  network 
tends  to  have  many  different  traffic  types  and 
the  traffic  flows  are  thus  highly  uncorrelated, 
appearing  to  be  "random."  Our  algorithm  sam- 
plesdistinct  packet  flows  (distinguished  bytheir 
source  and  destination  addresses)  to  determine 
if  there  isa  large  amount  of  correlation  between 
the  packets.  If  it  is  determined  to  be  so,  then  all 
su sp i ci o u s  fl  o ws  at  th e  n ode  are  agai  n  co r rel ated 
with  each  other  to  determine  that  it  is  indeed  an 
attack  and  not  a  case  of  a  traffic  overload.  We 
compared  our  technique  to  a  simple  packet 
counting  algorithm  for  DDoS  detection  and 
found  that  our  technique  is  much  more  sensi¬ 
tive  in  detecting  attack.  Complexity  differential 
is  defined  as  the  difference  between  the  cumula¬ 
tive  complexities  of  individual  packets  and  the 
total  complexity  computes  when  those  packets 
are  concatenated  to  form  a  single  packet.  I  n 
effect,  we  use  the  measure  of  the  compressibility 
of  the  packets  accumulated  in  a  given  time  inter¬ 
val  to  determine  correlation.  We  believe  that  it 
will  also  be  much  more  accurate  in  separating 
false  alarmsfrom  true  attacks.  Thisset  of  experi¬ 
ments  is  underway. 

Hypothesis  2 

It  is  hypothesized  that  the  degree  to  which  informa¬ 
tion  can  be  compressed  algorithmically  isa  measure 
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of  the  ease  of  understanding  the  information,  particu¬ 
larly  by  an  attacker.  This  concept  is  used  to  estimate 
the  vulnerability  of  a  system  through  given  observ¬ 
able  points  within  the  system.  Apparent  complexity 
has  been  defined  in  this  project  as  the  complexity 
normalized  to  the  prior  knowledge  of  an  individual. 
Prior  knowledge  can  be  obtained  automatically  by 
watching  a  potential  attacker  within  a  fishbowl,  dur¬ 
ing  an  attack,  or  assigned  byother  means.  We 
believe  that  our  technique  is  more  efficient,  robust, 
and  ubiquitously  applicable  than  developing  a  data¬ 
base  of  vulnerabilities  and  testing  for  all  potential 
vulnerabilities  as  most  people  have  been  attempting. 

1.  Complexity  probes  (in  the  form  of  Magician 
Active  Packets)  have  been  developed  and  a  test 
plan  put  together  that  would  help  verify  this 
hypothesis. 

Hypothesis  3 

It  is  hypothesized  that  the  algorithmically  com¬ 
pressed  representation  of  a  network  fault  provides 
unique  opportunities  for  seeding  the  composition  of 
solutionsthat  mitigate  the  fault.  Thus,  the  relation¬ 
ship  between  complexity  and  fault/  solution  compo¬ 
sition  is  being  explored  for  the  development  of  self¬ 
organizing  solutions. 

1.  A  simple  Mathematica 'simulation  of  an  active 
network  has  been  developed  that  focuses  on  the 
algorithmic  aspects  of  data  encoded  within  a 
packet.  It  abstracts  away  networking  details 
allowing  a  focused  study  of  the  tradeoff  in  algo¬ 
rithmic  versusstatic  information  transmission. 

2.  A  Mathematica'-based  Genetic  Algorithm  simu¬ 
lator  has  been  instrumented  with  complexity 
estimation.  A  decrease  in  complexity  was  noted 
during  the  initial  evolutionarystagesof  all 
genetic  algorithms  tested.  In  other  words,  as 
optimal  solutions  evolved,  the  measurement  of 
the  complexity  of  the  total  system  decreased. 

3.  Previouswork  using  genetic  algorithms  to  both 
determine  Kolmogorov  Complexity  and  gener¬ 
ate  algorithmic  representation  of  multimedia 
data  were  recentlyfound  in  the  literature  and 
help  to  validate  our  approach. 

1.2  PROJECT  CHALLENGES 

Accurate  estimation  of  KolmogorovComplexity  is 
salient  to  its  ubiquitous  application  to  network  fault 
tolerance  and  security.  We  will  benchmark  our  new 
compression  algorithm  and  estimator  for  K(x) 
against  other  means  in  search  of  a  model  base  under 


MML  that  is  effective  in  efficiently  and  accurately 
estimating  KolmogorovComplexity. 

With  respect  to  the  DDoS  KolmogorovComplex¬ 
ity  application,  its  performance  will  be  compared  to 
other  detection  algorithmsthat  are  currently  in  use. 
In  particular,  its  performance  has  to  be  measured  in 
terms  of  resource  tradeoffs,  detection  and  false- 
alarm  probability  and  response  time.  1 1  is  hypothe¬ 
sized  that  other  DDoS  detection  techniques,  while 
optimized  for  detecting  certain  types  of  attacks,  will 
not  be  as  robust  in  detecting  all  types  of  attacks. 

A  challenge  for  this  project  is  identifying  or  devel¬ 
oping  the  fundamental  theory  for  composition  of 
solutions  using  KolmogorovComplexity. 

1.3  CHALLENGE  QUESTIONS 

1.  How  well  can  Kolmogorov  Complexity  be  esti¬ 
mated? 

2.  What  are  the  benefits  and  tradeoffs  associated 
with  algorithmic  information  transmission? 

3.  Can  a  network  fault  be  represented  algorithmi¬ 
cally? 

4.  Can  the  algorithmic  representation  of  a  network 
fault  seed  the  formation  of  optimal  solutions? 

5.  What  isthe  meaning  of  the  minimal  Turing 
M  ach i n e  gen erated  f ro m  a  co m p ressi on  al go- 
rithm? 

6.  Can  a  tolerance  be  incorporated  to  tradeoff 
computation  and  complexity  estimation? 

7.  I  sthere  a  convergent  form  of  complexity  estima¬ 
tion,  i.e.  allowing  the  complexity  to  converge  to 
a  value? 

8.  Can  information  fusion  be  accomplished  more 
efficiently  using  algorithmic  forms  of  informa¬ 
tion? 

9.  Could  information  that  is  in-transit  within  the 
network  be  combined  so  as  to  reduce  complex¬ 
ity?  Example:  Bioinformatic  data/  algorithms 
could  be  fused  within  the  network  from  multiple 
sources  and  only  those  combinations  leading  to 
lowest  complexity  are  kept.  Assuming  that  lowest 
complexity  indicates  most  likely  explanation. 
Network  complexity  reduction:  think  of  com¬ 
plexity  as  energy;  the  network  tries  to  find  lowest 
energy  state. 

10.  What  are  the  fundamental  theorems  derivable 
from  KolmogorovComplexity  that  can  allow  us 
to  define  algorithmsfor  self-composition? 

11.  Much  work  has  been  done  in  the  past  on  detect¬ 
ing  and  measuringtheimpactoffaults.  Howcan 
one  quantify  and  measure  the  impact  of  solu- 
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tions,  which  are  of  equal  importance  in  match¬ 
ing  (composing)  faults  and  solutions? 

12.  ChaosTheory  views  system  operation  in  terms 
of  phase  (state)  space.  Attractors  are  patterns  in 
phase  space  in  which  the  system  tendsto  remain 
( a  co  h  eren  t  o  rgan  i  zati  on ) .  T  h  i  n  k  of  attracto  rs  as 
the  gravitational  pull  keeping  the  network 
together  (or  functioning  properly).  Because 
attractors  are  patterns,  they  are  h  igh  I y  compress¬ 
ible  (  low  complexity)  .Isa  measure  of  the  system 
to  self-organ ize  is  a  ratio  of  the  size  of  the  attrac¬ 
tor  versus  the  size  of  the  operational  phase 
space? 


13.  Consider  algorithmic  representation  of  faults. 
Using  reversible  code  (anti-code)  it  would  be 
possible  to  reverse  the  computation,  i.e.  elimi¬ 
nate  the  fault.  I  believe  they  have  developed 
anti-code  compilers.  What  could  one  say  about 
the  complexity  of  code  and  its  corresponding 
anti -code? 
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A  key  assumption  of  this  work  isthat  complexity  and 
information  assurances  are  related.  Clearly,  the  sys¬ 
tem  should  appear  complex  to  an  attacker  and  sim¬ 
ple  to  a  legitimate  user.  H  owever,  in  order  to 
anticipate  questionsthat  an  astute  reader  might 
have,  this  section  is  written  in  the  form  of  a  question 
and  answer  dialog.  Our  colleagues  have  raised  some 
of  these  questions;  however,  identities  will  not  be 
revealed  in  order  to  protect  the  innocent. 

Question:  Aren't  higher  complexity  systems  more  vul¬ 
nerable  to  attack?  How  does  that  correlate  with 
an  attacker  following  the  path  of  least  vulnerabil¬ 
ity? 

Answer:  Assume  the  KolmogorovComplexityof  a  sys¬ 
tem  can  be  represented  byx,  that  is,  K{x),  is  by 
definition,  the  size  of  the  smallest  program  capa¬ 
ble  of  generating  x.  Thus,  K{x)  does  not  vary  with 
the  implementation  of  the  system.  In  Section  5.0, 
we  discussed  how  the  brittleness  of  a  system 
changes  with  respect  to  the  efficiency  of  the 
implementation.  We  define  AK^x)  to  be  the  com¬ 
plexity  of  system  xas  viewed  by  attacker  r.  While  it 
is  intuitively  and  empirically  true  that  more  com¬ 
plex  systems  tend  to  have  more  security  problems 
this  isattributed  to  the  inability  of  the  defender  to 
understand  their  own  system  fully  and  thus  com¬ 
prehend  howto  best  defend  it.  It  isthe  differen¬ 
tial  between  the  defenders  understanding  of  a 
system  and  an  attackers  understanding  that  is  a 
measure  of  true  information  assurance.  The 
desired  goal  is  simplicity  for  the  defender  and 
complexity  for  the  attacker. 

Question:  Suppose,  asan  attacker,  I  have  found  an 
encryption  key.  The  encrypted  data  appears  very 
complex,  yet  knowing  the  key,  I  can  easily  obtain 
the  information. 

Answer:  There  isan  estimate  of  complexity  known  as 
M  inimum  Data  Length  description  that  involves 
compressingbothdataandthehypothesisusedto 
generate  that  data.  If  the  apparent  complexity, 
AKy[x),  isestimated  for  an  attacker  known  to  have 
the  encryption  key,  then  the  complexity  will  be 
very  low.  The  complexity  of  the  encrypted  data  is 
always  K{  x) .  The  apparent  complexity  of  the  data 
in  the  absence  of  the  encryption  key  is  much 
greater  than  K{x).  When  an  attacker  gainsthekey 


the  differential  between  K{x)  and  apparent  com¬ 
plexity  is  dissolved  and  all  security  is  lost. 

Question:  Wouldn't  an  attacker  choose  to  hide  inside 
a  more  complex  component  than  a  simple  one?” 

Answer:  A  similar  question  appears  in  the  study  of 
work  factor  asan  information  assurance  metric. 
An  attacker  maybe  willing  to  spend  more  effort, 
or  take  a  higher  complexity  path,  if  he  has  the 
time  and  has  a  suitably  high  interest  in  avoiding 
detection.  Again,  the  attacker  can  exploit  the 
defenders  inability  to  understand  their  system. 

Question:  Wouldn't  estimating  the  complexity  of 
every  bit-stream  in  a  system  require  a  lot  of  over¬ 
head?  Is  there  a  more  efficient  way?” 

Answer:  An  implementation  of  complexity-based  vul¬ 
nerability  analysis  th at  requires  bit-stream  level 
computation  for  every  possible  dataflow  would 
require  a  lot  of  overhead.  One  possible  approach 
is  to  look  at  more  aggregate  views  of  the  system 
and  determine  complexity  from  SNMP  variables 
asan  example.  However,  consider  thatnon-com- 
plexity-based  alternative  approaches  that  attempt 
to  include  extreme  detail  quickly  find  the  prob¬ 
lem  to  be  overwhelming  and  in  addition,  such 
approaches  are  generally  easily  broken  if  they 
miss  a  particular  detail. 

Question:  0  ne  of  your  estimates  of  complexity  relies 
upon  the  inverse  compression  ratio.  Isn't  that 
based  upon  entropy  rather  than  complexity?” 

Answer:  Yes,  our  initial  complexity  estimation  tech¬ 
nique  relied  on  the  inverse  compression  ratio. 
This  was  chosen  as  a  Kolmogorov  estimator 
because  it  appeared  to  be  a  low  overhead  and  easy 
to  implement  technique. 

Question:  Since  there  are  such  sloppy  bounds  associ¬ 
ated  with  any  estimate  of  K(x)  isn't  the  ability  to 
measure  and  utilize  conservation  of  complexity  a 
pipe  dream?” 

Answer:  Estimating  Kolmogorov  complexity  isa  chal¬ 
lenge.  Our  current  research  isfocused  on  finding 
the  best  metrics  and  quantifying  bounds  associ¬ 
ated  with  these  metrics  to  determine  the  usability 
of  conservation  of  complexity  to  solve  real  prob¬ 
lems.  Our  hypothesis  isthat  beyond  certain 
thresholds  abnormal  behavior  will  be  noticeable 
using  conservation  of  complexity. 
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2.1  SURVEY  OF  RELEVANT  EXISTING 

SECURITY  TECHNIQUES  AND  THEORY 

A  fundamental  basisfor  information  security  is  elu¬ 
sive.  Numerous  theories  have  come  to  lightlately 
that  look  for  a  fundamental  basisfor  the  study  of 
security  systems.  In  this  chapter  we  review  some  of 
the  more  fundamental  work  in  this  area.  We  con¬ 
clude  this  chapter  with  two  analogies— to  Thermo¬ 
dynamics  and  to  Electrical  Engineering— that  yield 
an  intuitively  pleasing  basisthat  we  would  like  to 
explore. 

Bennett/Zurick 

Physics  of  information  has  been  studied  for  decades 
with  many  interesting  theoretical  contributions  by 
Zurek  et  al.[  117]  and  Bennetetal  [99].  This  body  of 
work  identifies  Kolmogorov  Complexity  as  a  basic 
property  inherent  in  the  physicsof  information,  and 
strives  to  resolve  actual  physical  lawsof  energy  with 
information  laws.  Quantum  computing  isa  related 
area.  Our  work  applies  many  of  the  concepts  i  ntro- 
duced  by  Bennet  and  Zurick  into  the  I  nformation 
Securitydomain. 

Harmon 

Reference  [24]  describesa  recently  developed 
model  of  information  systems  where  the  fundamen¬ 
tal  devices  are  processors,  routers,  memory  compo¬ 
nents  and  communication  components  that  serve  to 
affect  information  in  the  form  of  modulated  energy. 
Assumptions  and  postulates  are  well  laid  out  to  pro¬ 
vide  possible  future  experimental  validation  of  this 
model.  System  complexity  is  defined  as  the  number 
of  dependencies  that  exist  between  pieces  of  infor¬ 
mation.  Our  approach  differs  from  thisapproach  in 
that  we  will  use  the  fundamental  quantity  of  Kolmog¬ 
orov  Complexity  as  our  basic  building  block. 

Fisher  Information 

Reference  [125]  puts  forth  a  unification  of  the  laws 
of  physics  and  the  statistical  quantity  known  as 
Fisher  Information.  Thisshares with  our  approach 
the  gravitation  towards  a  fundamental  parameter,  in 
thiscase  Fisher  Information,  which  applies  locally  to 
specific  data  asopposed  to  general  source  distribu¬ 
tions  from  which  data  are  generated,  asisthecase 
with  Shannon  entropy.  Fisher  information  isdefined 
as  follows:,  where  A  isthe  likelihood  function 
described  by:  given  Z,  a  set  of  observations  and  x,  a 
time  invariant  parameter  measured  by  observation 
set  Z.  A  resolution  of  our  approach  to  these  results  is 
desired. 


Current  Security  Techniques 

Information  security  (or  lack  thereof)  istoo  often 
dealt  with  after  security  has  been  lost.  Back  doors  are 
opened,  T rojan  horses  are  placed,  passwords  are 
guessed  and  firewalls  are  pierced— in  general,  secu¬ 
rity  is  lost  as  barriers  to  hostile  attackers  are 
breached  and  one  is  put  in  the  undesirable  position 
of  detecting  and  patching  holes.  In  fact  many  holes 
go  undetected.  Breaches  in  other  complex  systems 
that  people  care  about  are  not  handled  in  such  an 
inept  manner.  Thermodynamic  systems,  for  exam¬ 
ple,  can  be  assured  of  their  integrity  by  the  pressure, 
heat  or  mass  the  system  contains.  H  ydrostatic  tests 
can  be  performed  to  ensure  that  there  are  no 
"holes,"  and  the  general  health  of  the  system  can  be 
ascertained  by  measuring  certain  parameters.  A 
problem  is  identified  as  soon  as  the  temperature  or 
pressure  drops,  and  immediately  one  can  take  action 
to  both  correct  the  problem  and  to  isolate  other 
areas  of  the  system  from  harm.  But  how  does  one 
perform  a  hydrostatic  test  of  an  information  system? 
What  conserved  parameters  exist  to  measure  the 
health  or  vulnerability  of  the  system?  Flow  can  one 
couple  the  daunting  task  of  providing  a  system 
where  vulnerabi lities are  readily  measurable  with  the 
required  need  for  simplicity  of  use  for  authorized 
users?  We  explore  these  issues  through  various  anal¬ 
ogies  and  propose  that  only  through  monitoring 
objective  quantities  inherently  related  to  informa¬ 
tion  itself  can  the  science  of  information  assurance 
move  beyond  patching  holes. 

Analogy  to  Thermodynamics 

An  attractive  analogy  for  an  information  security  sys¬ 
tem  is  given  in  an  analogy  to  thermodynamics.  In 
thermodynamic  systems  lawsof  conservation  and 
energy  flow  allow  monitoring  of  the  health  of  the 
system  through  parameters  such  as  temperature, 
heat,  and  volume.  One  does  not,  for  example,  in  a 
thermodynamic  system,  wait  for  all  the  heat  to  drain 
from  a  heat  exchanger  and  a  rat  to  come  inside  to 
announce  that  there  isa  problem.  One  can  tell  from 
parameters  such  as  temperature  and  pressure  that 
thesystem  isbehaving  abnormally.  Concepts  such  as 
entropy  and  mass  map  nicely  to  the  information 
security  domain.  Through  our  exploration  of  Kol¬ 
mogorov  Complexity,  we  pursue  the  analogy  to  ther¬ 
modynamics. 

Analogy  to  electrical  engineering 

Analogies  have  been  drawn  between  basic  electrical 
engineering  parameters,  such  as  impedance,  cur- 
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rent,  and  voltage,  and  infor  mation  assurance  of  an 
information  system.  Electrical  current  could  be  com¬ 
pared  to  information  flowto  dishonest  participants. 
H  igh  resistance  or  good  insulation  on  an  electrical 
cable  could  represent  a  network  with  few  holes. 
Through  sufficient  mappings,  some  of  which  are 
shown  in  Table  1,  simplified  models,  parallels  to 
Norton  andThevenin  equivalent  circuits,  could  be 
developed  through  simple  measurements  and  trans¬ 
formations.  Additionally,  the  large  body  of  work  ded¬ 
icated  to  detecting  and  protecting  against  electrical 
faults  and  disturbances  could  provide  some  benefit 
to  the  field  of  information  assurance. 


Table  1  Electrical  and  information  assurance 
properties 


Electrical  property 

Security  property 

Current 

Data  flow  to  or  from  dis¬ 
honest  participants 

Voltage 

Information  assurance 
potential 

Resistance 

Resistance  to  data  flow  to 
dishonest  participants 

Inductance  and  capaci¬ 
tance 

These  values  follow  by 
direct  insertion  of  above 
analogies  into  electrical 
definitions  of  inductance 
and  capacitance. 

H  ere  the  information  assurance  potential  is  a  mea¬ 
sure  of  the  ability  of  a  system  to  defend  or  a  perpe¬ 
trator  to  intrude  upon  an  information  system.  The 
equivalent  resistance  of  the  system  is  determined  by 
considering  and  quantifying  all  of  the  possible  sys¬ 
tems  (Figure  1).  A  distinction  is  made  between 
active  networks  [3]  and  today's  legacy,  or  passive  net¬ 
works,  in  this  proposed  electrical  engineering  para¬ 
digm.  The  work  involved  in  forwarding  a  packet, 
whether  active  or  passive  is  current  that  can  cause  a 
node  to  do  work,  that  is,  current  in  a  motor  winding 
that  causes  energy  transfer  in  a  different  form.  With 
regard  to  active  packets  and  information  theory,  pas¬ 
sive  data  is  simple  Shannon  compressed  data,  and 
active  packets  are  combination  data  and  programs 
whose  efficiency  can  be  estimated  through  Kolmog¬ 
orov  Complexity.  Information  assurance  laws  must 
be  able  to  deal  with  many  alternative  representations 
of  information.  Section  3  discusses  an  electrical 
engineering  grid  type  of  information  assurance  tool. 


Figure  1.  Electrical  and  Information  Assurance 
Properties. 


2.2  THE  NETWORK  INSECURITY  PATH 
ANALYSIS  TOOL  (NIPAT) 

Consider  a  specific  grid-based  information  assur¬ 
ance  tool  known  as  the  Network  Insecurity  Path 
AnalysisTool  (NIPAT).  NIPAT  is  a  powerful  security 
analysistool  developed  atGE  Global  Research  that 
has  been  improved  bythe  resultsof  thisproject  in 
complexity-based  vulnerability  analysis.  NIPAT 
serves  as  a  positive  representation  for  grid-based 
information  assurance  tools  in  general.  Section  3.1 
discusses  their  weaknesses.  Figure  2  displays  2,000 
vulnerabilities  found  on  a  few  nodes  of  a  network 
that  were  thought  to  be  reasonably  secure.  Vulnera¬ 
bilities  are  displayed  in  Figure  2  by  host  and  type. 
The  number  along  each  edge  of  the  graph  repre- 


Figure  2.  A  Grid-Based  Tool  in  Action. 
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sen ts  the  number  of  opportunities  available  to  the 
attacker  to  reach  the  next  vulnerability.  N I  PAT  can 
automatically  generate  a  directed  graph  represent¬ 
ing  the  security  vulnerabilities  of  a  network.  This 
information  is  gathered  from  network  security  soft¬ 
ware  agents.  The  security  vulnerability  graph  for  a 
typical  network  can  be  extremely  dense;  however, 
the  object-oriented  nature  of  the  security  model  is 
useful  in  choosing  the  level  of  abstraction  required. 
For  example,  it  maybe  possible  to  display  the  vulner¬ 
ability  graph  for  Unix  hosts  in  general  and  to  hide 
the  details  of  individual  Unix  variants.  NIPAT  deter¬ 
mines  the  degree  to  which  specified  targets  within 
the  network  can  be  compromised.  The  vulnerability 
chain  is  displayed  as  a  directed  graph.  Nodes  repre¬ 
sent  vulnerabilities  whose  security  may  be  compro¬ 
mised,  and  edges  represent  paths  from  vulnerability 
to  vulnerability.  The  larger  the  value  of  the  edge 
I  abel ,  th e  greater  th e  vu  I n erabi  I  i ty.  The  focus  of  th  i s 
effort  is  on  the  mathematical  representation  of 
information  assurance;  thus,  the  underlying  data¬ 
base  and  data  gathering  agents  are  not  discussed  in 
detail  here.  See  Appendix  C  for  more  on  the  opera¬ 
tion  of  the  NIPAT  tool. 

The  information  assurance  model  assumed  by 
NIPAT  isthatof  an  attacker  who  hasa  finite  amount 
of  resources  with  which  to  penetrate  network  secu¬ 
rity.  A  resource  vector  for  the  attacker  is  assumed. 
The  cost  to  the  attacker  of  using  each  of  the 
resources  against  a  particular  network  is  defined  by 
consumption  functions.  The  cost  to  network  security 
of  implementing  security  measures  is  defined  bya  k- 
dimensional  security  function.  The  attacker's 
resource  vector  consists  of  the  strength  of  each  ele¬ 
ment  of  the  attacker's  resources.  For  example,  the 
password  decryption  resource  value  would  consist  of 
the  attacker's  CPU  speed  and  amount  of  time  the 
attacker  would  be  willing  to  spend  on  the  attack. 

The  N  FS  spoofing  resource  value  would  be  the  time 
to  install  and  run  the  NFS  spoofing  software  multi¬ 
plied  by  the  probability  that  the  attacker  has  access 
to  such  software.  The  host  spoofing  resource  value 
would  be  a  function  of  the  attacker's  ability  to  evade 
the  physical  security  of  a  network  and  install  or  mod¬ 
ify  a  host  IP  address.  The  consumption  function  vec¬ 
tor  is  the  complement  of  the  attacker's  resource 
vector.  For  example,  a  network  with  good  password 
encryption  algorithmsor  whose  users  use  well-cho¬ 
sen  passwords  will  have  a  high  value  for  the  con¬ 
sumption  function  for  password  decryption.  Clearly, 
attempting  to  define  all  possible  security  threats  to 


any  system  isa  huge  undertaking.  Flowever,  the 
scope  of  network  security  is  confined  to  the  security 
object  model.  In  thisexample,  the  following  tech¬ 
niques  are  assumed  to  be  available  to  the  attacker: 
password  decryption,  NFS  spoofing,  hosts  spoofing, 
and  application  security  faults.  Password  decryption 
assumes  the  attacker  hasa  program  capable  of 
decrypting  users'  passwords.  NFS  spoofing  involves 
violating  security  to  mount  another  user'sfile  sys¬ 
tem;  host  spoofing  is  causing  a  host  to  appear  to  the 
network  as  a  different  host;  and  an  application  secu¬ 
rity  fault  istaking  advantage  of  an  application-pro¬ 
gramming  fault  in  order  to  attack  the  security  of  a 
system.  Each  of  these  resources  is  measured  in  units 
of  time.  Thus,  an  attacker  with  a  powerful  computer 
and  a  willingness  to  wait  a  long  period  of  time  to 
break  into  a  network  will  have  a  large  password 
decryption  resource.  An  attacker  with  physical 
access  to  the  network  and  competent  knowledge  will 
have  a  high  host  spoofing  resource  value  because 
such  an  attacker  can  ph ysically  connect  a  host  to  the 
network.  An  attacker  with  much  experience  and 
knowledge  of  applications  will  have  a  large  applica¬ 
tion  security  fault  resource. 

Another  form  of  vulnerability  analysis  involves 
detecting  vulnerabilities  that  change  over  time.  The 
network  monitoring  tool  quantifies  the  vulnerability 
of  a  system  in  terms  of  percent  of  patches  which  fail 
to  have  the  correct  signature,  percent  of  files  which 
are  accessible  to  others  besides  the  owner,  and  per¬ 
cent  of  passwords  which  can  be  guessed  with  a  given 
password  generation  tool.  Clearly,  vulnerability 
checks  such  as  these  increase  the  security  of  the  net¬ 
work.  Both  the  type  of  information  gathered  and  the 
frequency  with  which  the  information  is  updated 
quantify  the  effectiveness  of  a  network  monitoring 
strategy.  If  the  information  isnot  updated  frequently 
enough,  an  attacker  may  have  penetrated  network 
security  and  left  before  network  security  is  aware  of 
the  situation.  An  estimate  of  the  effectiveness  of  the 
monitoring  system  is  based  on  a  profile  of  network 
security  attacks  on  the  Internet  and  the  following 
parameters:  time  to  monitor  patches,  Trojan  horses, 
passwords,  and  any  other  vulnerabilities.  The  attack 
rate  is  assumed  to  be  Poisson.  The  average  attack 
rate,  based  on  Internet  incident  reports  from  an 
anonymous  site  for  a  six-year  period,  isfive  attacks 
per  month.  Also  the  Defense  Information  Systems 
Agency  has  determined  by  experimental  means 
[107]  that  only  0.7%  of  incidents  are  actually 
reported.  Thus,  for  each  path  in  the  network  secu- 
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rity  vulnerability  chain,  the  cost  to  the  attacker  isthe 
probabilityof  being  detected  multiplied  bythe  cost 
function  that  the  additional  monitoring  provides. 

The  following  list  describes  the  capabilities  and 
benefits  that  a  vulnerability  assessment  tool  could 
provide:  An  automated  network  security  assessment 
tool  should  have  the  capability  of  automatically  gen¬ 
erating  a  directed  graph  of  vulnerabilities.  This 
information  can  be  gathered  from  network  security 
software  agents.  The  security  vulnerability  graph  for 
atypical  network  can  be  extremely  dense;  however, 
the  object-oriented  nature  of  the  security  model  may 
be  useful  in  choosing  the  level  of  abstraction 
required.  For  example,  it  maybe  possible  to  display 
the  vulnerability  graph  for  Unix  hostsin  general  and 
to  hide  the  detailsof  individual  Unix  variants.  The 
network  security  vulnerability  tool  determines  the 
degree  to  which  the  network  security  can  be  com¬ 
promised  based  on  minimizing  an  objective  function 
that  represents  the  cost  to  the  attacker.  Based  on  the 
vulnerability  graph,  the  optimal  deployment  loca¬ 
tion  and  capability  mix  of  the  security  agents  is 
determined.  Note  that  this  isclosed  feedback  loop; 
the  security  agents  are  sending  information  to  the 
security analysistool,  which  controlsthe deployment 
of  the  agents.  The  network  security  vulnerability 
assessment  tool  should  indicate  the  degradation  in 
the  quality  of  service  to  legitimate  network  users  as 
security  counter  measures  are  taken  as  well  as 
dynamically  indicate  the  security  vulnerability  of  the 
network.  An  object-oriented  prototype  network  vul¬ 
nerability  analysis  tool,  NIPAT,  which  implements 
most  of  the  above  requirements,  has  been  imple¬ 
mented  using  Java.  The  vulnerability  chain  isdis- 
played  asadirected  graph.  Nodes  represent  entities 
whose  security  may  be  compromised,  and  paths  rep¬ 
resent  the  vulnerability  of  an  entity.  The  larger  the 
value  of  the  path  label,  the  greater  the  vulnerability. 

Mathematica  [139]  provides  an  ideal  environ- 
mentfor  experimenting  with  symbolic  mathematical 
concepts.  The  adjacency  matrix,  which  represents 
the  vulnerability  graph  from  NIPAT,  can  be  read  into 
Mathematica.  The  directed,  weighted  adjacency 
matrix  is  used  to  determine  the  shortest  path 
between  every  two  nodes.  The  insecurity  values  can 
be  displayed  as  a  contour  map  and  a  density  plot, 
where  high  areas  in  the  topological  view  are  secure, 
and  those  lower  are  relatively  I  ess  secure,  as  shown  in 
Figure  3  and  Figure  4  for  the  system  shown  in 
Figure  2. 


Figure  3.  Topographical  Map  of 
Security. 


Figure  4.  Density  Graph  of  Secu¬ 
rity. 

As  has  already  been  mentioned,  extant  forms  of 
automated  security  vulnerability  analyses  rely  on 
polling,  which  becomes  infeasible  in  large-scale  net¬ 
works  and  in  highly  dynamic  environments.  Other 
approaches  towards  vulnerability  analysis  and  intru¬ 
sion  detection  need  to  be  developed.  There  are  two 
independent  research  efforts  that  are  leading 
towards  a  mutually  beneficial  solution  to  the  vulner¬ 
ability  assessment  and  network  security  problem. 
These  research  efforts  are  the  human  biological 
immune  system  approach  to  network  security  and 
Active  Networks [3],  Active  networking  implements 
the  cliche  that  “the  network  is  the  computer."  Active  net¬ 
working  allows  users  of  the  computer  communica¬ 
tions  network  to  inject  programs  into  the  network  to 
customize  processing  of  user  and  application  spe¬ 
cific  data.  Thus,  just  as  hormones  control  and  regu¬ 
late  biological  systems,  active  networks  allow 
programs  to  travel  the  network  modifying  security 
behavior.  The  biological  analog  of  intrusion  detec¬ 
tion  is  highly  distributed.  The  advantage  of  a  distrib¬ 
uted  intrusion  detection  system  isthatthe 
probabilityof  detecting  an  intruder  increases  si gnifi- 
cantlyasthe  intruder  isforced  to  passthrough  more 
independentlyoperated  intrusion  detection  systems. 
Biologically  inspired  forms  of  vulnerability  quantifi- 
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cation  would  include  injecting  a  network  with  a 
harmless  virus  and  measuring  howfar  it  can  spread 
throughout  the  network.  This  would  clearly  indicate 
the  location  of  vulnerabilities.  A  more  aggressive 
solution  would  involve  "growing”  many  simple  cells 
(processes)  in  a  closed  computer  environment. 
These  cel  Is  (processes)  constantly  mutate,  repro¬ 
duce  when  they  successfully  attack  an  intruder  (non¬ 
self),  and  die  when  they  attack  legitimate  system  and 
user  processes  (self).  Over  time,  bynatural  selection, 
only  useful  processes  will  remain  which  can  be 
injected  into  a  network  and  used  to  detect  and 
attack  intruders.  Clearly,  active  networking  enables 
new  and  more  flexible  security  safeguards  in  addi¬ 
tion  to  facilitating  the  development  of  the  immuno¬ 
logical  approach  towards  network  security. 

A  network  security  analyst  can  allocate  security 
safeguards  in  order  to  minimize  the  entire  network 
vulnerability,  or  to  minimize  the  vulnerability  from 
known  attack  points  to  particular  targets.  A  quick 
study  using  NIPAT  is  presented.  First,  from  a  funda¬ 
mental  network  vulnerability  flow  viewpoint,  the 
strategy  of  allocating  safeguards  in  combinationsof 
serial  and  parallel  strategies  can  be  examined. 

Figure  showsNIPAT  analyzing  an  attack  from  host  A 
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Figure  5.  An  Example  of  Security  Safe  Guard  Assumption. 


to  host  B.  In  this  case,  the  number  of  opportunities 
has  been  normalized  into  probabilities.  Figure  6 
shows  the  results  as  security  safeguards  are  removed. 
The  solid  line  isthe  vulnerability  of  a  single  connec¬ 
tion  from  the  attacker  to  the  defender  having  the 
same  vulnerability  flow  as  the  links  shown  in  Figure . 
With  a  probability  of  less  than  0.6  a  diversity  of  vul- 
n erabi I ity  types  helps  to  increase  security,  but  inter¬ 
estingly,  above  0.6  it  does  not. 


Security  Safe-Guard  Allocation 


Figure  6.  Series  versus  Parallel  Vulnerability  Attack. 


Let  us  assume  that  vulnerability  has  been  calcu¬ 
lated  by  NIPAT  to  be  either  the  maximum  insecurity 
flowor  probability  of  successful  attack,  whereSrep- 
resents  security  safeguards,  C(S)  isthe  cost  of  secu¬ 
rity,  and  L  isthe  cost  constraint  or  some  other  hard 
resource  limit.  Next  we  discuss  the  cost  in  terms  of 
impact  on  users;  here  it  is  strictly  a  financial  cost  or 
other  resource  constraint.  Objective  Function  3.1 
showshowthe  optimal  security  safeguard  allocations 
can  be  determined. 

It  is  possible  to  use  NIPAT  to  study  various  strate¬ 
gies  of  both  defensive  and  offensive  players  in  a  net¬ 
work  attack.  0  nee  an  attack  has  been  detected,  the 
network  command  and  control  center  can  respond 
to  the  attack  by  repositioning  security  safeguards 
and  by  modifying  services  used  by  the  attacker.  How¬ 
ever,  cutting-off  services  to  the  attacker  also  impacts 
legitimate  network  users,  and  a  careful  balance  must 
be  maintained  between  minimizing  the  threat  from 
the  attack  and  maximizing  service  to  customers.  For 
example,  various  stages  of  an  attack  are  shown  in 
NIPAT  in  Figure  7  along  the  yellow  path.  Since  the 
allocation  of  security  resources  never  changes 
throughout  the  attack,  the  vulnerabil  ity  of  the  target 
increases  significantly  with  each  step  of  the  attack. 

Our  proposed  enhancement  would  be  to  incorpo¬ 
rate  the  following  algorithm  into  NIPAT.  LetGSrep- 
resent  the  network  service  to  customers,  with  a 
minimum  accepted  quality,  Q.  Let  V(S,A)  be  the  vul¬ 
nerability  of  the  network  to  a  particular  attacker,  A. 
Then  Objective  Function  3.2  shows  the  optimal  net¬ 
work  response  given  the  current  state  of  the  attack. 

The  results  of  this  proposed  research  will  include 
a  better  understanding  of  howto  respond  to  net¬ 
work  security  attacks.  In  addition  the  existing  NIPAT 


15 


2.  Discussion 


Mouse  Action: 

vCneate  Nodes  vC reate  Edges 

'‘‘Select  Nodes  v-Select  Edges 

^Select  Nodes  or  Edges 
Viewing  Offset 

Center  j 

Scale:  1 

Scale  /  2  Scale  =  1  |  Scale  *  2  | 
Viewing  Angles 


■ 

tte&H 

Iw90 

File  Algorithms  Edit  Properties 


Figure  7.  Various  stages  of  attack. 


will  be  incorporated  with  new  algorithms,  serving  as 
experimental  validation  of  the  results  from  this 
project. 

Failure  of  the  grid-based  approach 

The  grid-based  approach,  limited  to  the  capabilities 
as  previously  discussed,  has  a  considerable  number 
of  shortcomings.  The  first  isthe  inability  of  the  grid- 
based  mechanisms,  as  presented  above,  to  assign 
meaningful  initial  values  that  represent  either  secu¬ 
rity  or  insecurity.  The  current  implementation  of 
NIPAT  uses  scalar  values  that  representthe 
"strength”  of  an  attacker  and  the  number  of  oppor¬ 
tunities  for  an  attacker  to  exploit  a  chain  of  a  priori 
identified  vulnerabilities.  The  reasoning  in  the 
development  of  NIPAT  is  that  the  strength  of  an 
attacker  is  a  representation  of  the  attacker's  power 
in  terms  of  combined  instructions  per  second, 
advanced  knowledge  of  the  system  under  attack,  and 
skill  in  the  use  of  attack  strategies.  It  has  been  pro¬ 
posed  to  augment  N I  PAT  with  vectors,  where  each 
element  represents  an  attacker's  strength  in  exploit¬ 
ing  various  predefined  vulnerabilities.  However,  this 
assumes  advanced  knowledge  of  all  possible  vulnera¬ 
bilities  and  the  attacker’s  strength  in  exploiting  each 
of  those  vulnerabilities.  This  is  not  a  reasonable 
assumption  for  a  system  of  even  low  complexity. 
Using  a  database,  expert  system,  or  object-oriented 
abstraction,  to  handle  aggregationsof  vulnerabilities 
does  not  lead  to  a  feasible  solution  because  these 
mechanisms  require  that  all  possible  vulnerabilities 
be  known  a  priori.  A  more  general  vulnerability  dis¬ 
covery  and  quantification  technique  is  necessary. 

It  isour  belief  that  manysuch  tools,  such  as 
NIPAT,  are  salvageable  as  an  information  assurance 


design  tools.  The  good  qualities  of  NIPAT,  such  as 
safeguard  optimization  and  likely  attack  path  identi¬ 
fication,  particularly  to  lead  an  attacker  to  a  fish¬ 
bowl,  are  useful  mechanismsfor  information 
assurance  design.  To  provide  a  brief  preview  of  our 
proposed  solution  for  grid-based  tools,  consider  the 
resistance  in  the  electronic  circuit  analogy  of  infor¬ 
mation  assurance  as  complexity  where  complexity 
and  resistance  are  directly  proportional.  The  rela¬ 
tionship  among  vulnerability,  resistance,  and  com¬ 
plexity  is  developed  in  more  detail  later  in  this 
report.  I  n  Section  3.3  we  look  at  the  properties 
required  of  a  meaningful  information  assurance 
metric. 

Fundamental  properties  and  parameters  of 
information 

As  discussed  in  the  introduction,  we  desire  to  move 
the  studyof  information  assurance  to  afundamental 
domain,  where  attacks  need  not  be  defined  in 
advance.  But  what  are  the  fundamental  propertiesof 
information  and  how  can  we  build  upon  them  to 
achieve  a  science  for  the  assurance  of  this  informa¬ 
tion.  We  discuss  below  some  basic  propertiesof 
information  that  are  candidates  for  fundamental 
parameters  upon  which  to  build. 

Size — In  his  ground  breaking  1949  paper,  Shannon 
introduces  fundamental  tradeoffs  and  limitationson 
the  ability  to  transmit  information  across  a  channel 
disturbed  byAdditiveWhiteGaussian  noise(AWGN) 
[98],  This  launched  the  science  of  information  the¬ 
ory  that  has  transformed  the  study  of  communica- 
tionsand  coding  of  information,  bringing  the  use  of 
the  term  "bit”  of  information,  which  Shannon  cred¬ 
its  to  J.W.  Tuckey,  into  the  mainstream  literature. 
The  idea  that  information  can  be  quantized  into  bits 
(or  sequences  of  yes  or  no  answers  to  questions)  is 
now  well  accepted,  and  one  measure  of  the  size  of 
information  isthe  number  of  bits  used  to  convey  the 
information.  Information  compression  coding  - 
both  lossless  and  lossy-  as  well  as  forward  error  cor¬ 
rection  coding  alter  the  size  of  the  information  in 
terms  of  bits  by  removing  or  adding  redundancy. 
However,  the  unit  of  size,  bits,  isthe  term  used  to  dis¬ 
cuss  the  size  of  information,  whether  it  is  efficiently 
coded  or  not,  error  prone  or  self-correcting.  Thus, 
while  it  is  possible  for  information  to  change  size 
without  altering  content,  size  is  a  fundamental  prop¬ 
erty  of  information  that  should  come  into  play 
under  a  set  of  fundamental  laws  of  information 
assurance. 
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Entropy — Shannon  entropy  [179]  is  a  fundamental 
propertyof  information  that  measures  the  uncer¬ 
tainty  of  a  random  variable  Xbased  on  the  probabili¬ 
ties  of  each  outcome: 

Entropy  therefore  relates  to  a  source  distribution 
of  a  random  variable.  Kolmogorov  complexity  isa 
related  parameter  that  will  be  discussed  in  detail 
that  relates  to  a  specific  sequence  of  information. 
These  two  parameters  are  extremely  powerful  prop¬ 
erties  of  information  that  occur  at  the  most  funda¬ 
mental  level. 

Density,  Mass  or  Energy — Density,  mass  and  energy 
are  properties  of  matter  that  have  parallel  and  intu¬ 
itively  pleasing  meanings  in  the  domain  of  informa¬ 
tion.  Density,  like  Kolmogorov  Complexity,  may 
measure  the  ability  of  a  sequence  to  be  compressed. 
Mass  may  simply  represent  the  number  of  ones  in  a 
sequence,  and  energy  asin  thermodynamics  may  tie 
together  quantities  such  as  mass,  density  or  entropy. 
The  overriding  goal  i  s  to  fi  n  d  parameters  that  can  be 
observed  directlyfrom  the  information  sequences 
themselves  and  compare  objective  quantities  on 
which  to  base  the  science  of  information  assurance. 

Towards  complexity-based  information 
assurance 

Beginning  with  a  high-level  view  of  the  problem  def¬ 
inition,  Figure  8  shows  both  secure  manner  of  oper- 


Mg=  Secure  Operations 

Figure  8.  Set  Theory  View  of  Secure  Operation. 

ation  and  insecure  operation.  Both  manners  of 
operation  exist  in  the  space  of  all  possible  forms  of 
operation,  M.  Insecure  operation,  M/(  consists  of 
those  methods  of  operation  that  allow  an  informa¬ 
tion  warfare  aggressor  entrance  or  access  to  control 
points  into  the  information  system.  The  intended 
secure  operation  areas Ms are  well  known,  and  some 
of  the  insecure  paths  are  also  known.  Note  that  Ms 
and  Mjcan,  and  usually  do,  overlap.  However,  the 
entire  area  of  operation  can  be  extremely  large  and 
an  exhaustive  search  for  all  insecure  operation  is  not 


feasible.  In  Figure  8,  Euclidean  distance  corre¬ 
sponds  to  the  degree  of  security.  This  leads  one  to 
consider  a  metric  space  upon  which  to  base  informa¬ 
tion  assurance.  The  initial  approach  assumesonly 
that  the  metric  has  the  characteristics  of  a  metric  in 
the  mathematical  sense  as  shown  in  Definition  3.1 
where  d  is  distance  and  p  and  y  are  points.  Point/* 
and  pointyhavenot  been  explicitly  defi  ned.  As  illus¬ 
trated  in  the  left  side  of  Figure  8,  an  information  sys¬ 
tem  de-composed  into  manyoperating  components 
could  have  a  surface  area  as  shown  on  the  right  side 
of  F i gu re  8.  N  ote  th at  th  i s  su rf ace  i s  I i  kel y  to  ch an ge 
as  a  function  of  time;  however,  the  time  indices  are 
not  written  for  now.  The  points/*  and  yare  assumed 
to  be  relative  to  some  absolute  value;  p  and  yean  be 
security  values  in  either  different  locations  or  at  dif¬ 
ferent  time  instances  of  the  system.  If  d  is  a  measure 
of  security,  then  Definition  3.1  implies  that  there  is 
no  difference  in  security  between  the  same  point 
and  itself;  however,  there  must  be  a  difference 
between  anytwo  distinct  pointsin  the  security  space. 
Definition  3.1  states  that  the  measure  between  any 
two  pointsin  this  space  should  be  the  same  regard¬ 
less  of  the  order  in  which  one  takes  the  measure¬ 
ment.  This  meansthat,  observed  from  a  common 
vantage  point,  if  security  is  measured  at  two  different 
pointsin  this  space,  p  and  y,  then  the  measure  of 
security  will  be  the  same  regardless  of  the  order  in 
which  the  points  are  entered  in  the  measure.  It  does 
not  imply  anything  about  the  strength  of  an  attack 
from  />to  y  or  an  attack  from  y  to  /*.  It  means,  for 
example,  that  if/*  is  less  than  y,  then  an  attack  from 
outside  the  system  against/*  will  be  more  likely  to 
succeed  than  an  attack  against  y.  Finally,  Definition 
3.1  states  that  the  distance  between  anytwo  points 
will  be  less  than  or  equal  to  the  sum  of  the  distances 
between  each  of  those  points  and  a  common  third 
point.  Again,  remember  that  this  is  a  measure  of 
security  taken  from  a  view  outside  the  system  of  a 
potential  attack  from  outside  the  system.  As  dis¬ 
cussed  in  more  detail  in  the  remainder  of  this 
report,  the  actual  measure  will  change  as  an  attacker 
penetrates  the  system  and  as  the  attacker  gains  more 
knowledge  of  the  system. 

In  Figure  3  and  Figure  4  a  topographical  and 
density  plot  shows  the  security  of  the  system  in 
Figure  2.  These  graphs  are  only  suggested  means  of 
viewing  information  assurance,  not  a  recommenda¬ 
tion.  Summing  the  attack  strength  at  each  node 
from  all  other  nodes  generates  the  graphs.  Thus,  the 
topology,  or  density,  isthe  vulnerability  of  a  particu- 
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lar  area  of  the  graph  to  all  attacks.  The  light  areas  in 
the  density  plot  and  the  higher  areas  in  the  topology 
map  are  areas  of  low  vulnerability,  while  the  darker 
areas  or  lower  areas  on  the  topology  map  are  areas 
that  are  well  secured.  Remember  that  these  are 
graphsof  known  vulnerabilities  and  the  likelihood 
that  they  will  be  penetrated.  The  problem  with  these 
graphs  istwo-fold:  what  isthe  metric  used  to  obtain 
the  insecurity  for  each  vulnerability,  and  how  can  it 
be  assured  that  all  vulnerabilities  have  been 
included  in  the  graphs?  Maps  such  as  these  require 
that  Definition  3.1  must  be  satisfied. 

Information  assurance  via  set  theory  and 
complexity 

If  information  assurance  can  be  proven  to  reside  in  a 
metric  space,  or  alternatively,  if  a  metric  space  can 
be  chosen  in  which  information  assurance  can 
reside,  then  principles  of  mathematical  analysis 
[100]  can  be  used  to  rigorously  determine  more 
detailed  characteristics.  For  example,  Mean  be 
extremely  large,  possibly  infinite.  Ar eMs,  or  con¬ 
versely,  Mj,  open  sets?  If  so,  can  limit  points  be 
defined?  What  does  an  open  set  mean  with  regards 
to  information  assurance  and  security?  Asa  simple 
example,  consider  a  password  protection  system. 
Each  character  that  a  legitimate  user  of  the  system 
addsto  a  password  increases  the  number  of  possibili¬ 
ties  that  a  brute  force  (non-dictionary)  attack  would 
require  in  order  to  guess  the  password.  Thus,  the 
longer  the  password,  the  more  secure  the  system. 
While  an  infinite  length  password  isnot  possible,  the 
security  does  begin  to  approach  a  limit  point.  This 
can  also  be  seen,  for  example,  in  any  security  safe¬ 
guard  that  works  via  the  addition  of  complexity(that 
is,  adding  more  states  to  the  Turing  Machine  to 
increase  security).  This  approach  towards  safeguard 
design  approaches  a  limit  point  but  can  never  reach 
perfect  security.  FI  owever,  in  general,  this  appears  to 
be  the  only  known  approach,  and  thus  limit  points 
must  exist.  Assurance  is  usually  increased  by  increas¬ 
ing  the  apparent  complexity  of  access  to  potential 
attackers  while  providing  legitimate  users  the  appar¬ 
ent,  least  complex,  or  in  some  sense,  shortest,  path 
to  accessof  information.  The  complexity  approach 
iscarried  forward  in  more  detail  in  Section  6. 

Topological  space  for  information  assurance 

By  definition,  an  open  set,  E,  isone  in  which  every 
point  is  an  interior  point.  A  point,  p,  is  an  interior 
point  of  £  if  there  isa  neighborhood,  N,  of  p  such 
that.  A  neighborhood  Nr(p)  of  point  p  consists  of  all 


pointsysuch  that  where  r  is  cal  led  the  radiusof  the 
neighborhood.  If  security,  asdetermined  by  a  given 
metric,  is  an  open  set,  then  there  are  significant 
i  m p I i cati o n s  because  of  th  i s.  T  h e  best  th at  can  be 
hoped  for  in  such  a  case  isto  determine  limit  points 
because  a  distinct  boundary  between  security  and 
insecurity  would  not  exist.  Will  it  be  the  case  that 
adding  layers  of  security  is  much  like  adding  "open 
covers”;  that  is,  the  result  can  never  be  perfect  secu¬ 
rity,  but  rather  an  approach  to  a  limit  point? The 
complement  of  an  open  set  is  closed;  what  does  that 
i  m  p  I y  f o  r  assessm en  t  of  i  n  secu  r i  ty?  N I  PAT  takes  both 
probabilistic  and  maximum  flow  approaches  to  com¬ 
puting  network  insecurity  flows.  Thistool  is  incom¬ 
plete  for  at  least  two  reasons:  it  assumes  that  all 
vulnerabilities  have  been  identified  and  measured 
and  that  the  vulnerabilities  can  be  manipulated  as 
discrete,  closed  sets.  In  order  to  determine  whether 
such  measurements  can  be  applied  to  information 
assurance,  consider  topology,  metric  spaces,  and  the 
fundamentals  of  measurement  theory  in  more 
detail.  The  definition  below  shows  how  the  topology 
is  induced  by  a  metric  d. 

In  the  definition  above  isa  collection  of  subsetsof 
such  that  and,  anyfinite  intersection  of  members  of 
isin,  and  anyunion  of  membersof  isin.  For  pur¬ 
poses  of  removing  unnecessary  detail,  assume  that 
the  information  system  isaTuring  Machine.  Also 
assume  that  the  vulnerability  analysistool  in  [43]  dis¬ 
plays  vulnerabilities  within  theTuring  Machine. 
Lemma  3.1  illustrates  the  topology  that  will  be 
induced. 

The  intuitive  notion  isthat  d  represents  the  ease 
of  movement  of  an  intruder  from  one  vulnerability 
to  another,  where  d{x,y)  :  XxX^9t .  A  simple  met¬ 
ric,  asdiscussed  previously,  isto  define  das  the  num¬ 
ber  of  state/ transition  sequences  with  in  a  Turing 
Machine  representation  of  a  system  which  an 
intruder  can  followto  move  from  vulnerability  xto 
vulnerability  y,  or  equivalently,  the  cardinality  of  the 
set  of  Vfrom  Definition  3.1.  In  this  induced  metric 
space,  the  N I  PAT  vulnerability  tool  can  be  consid¬ 
ered  an  overlay  of  theTuring  Machine  representa¬ 
tion  of  the  system.  The  NIPAT  tool  filters  out  the 
state  and  transition  details  and  showsonlythe  direct 
connection  among  the  vulnerabilities.  Does  infor¬ 
mation  assurance  reside  within  this  metric  space? 
One  test  would  be  whether  the  metric  supports  the 
design  tradeoffs  required  in  determining  brittleness 
in  the  design  of  the  system.  To  answer  the  above 
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question,  \etAcv  be  the  set  of  currently  exploited 
vulnerabilities.  Most  information  security 
approaches,  including  the  one  above,  assume  that 
ail  vulnerabilities  have  been  discovered  and  mea¬ 
sured.  Thiscan  never  be  assumed  to  be  the  case.  Per¬ 
formance,  a, from  Definition  6.1  isan  open  set,  and 
as  n  ew  secu  r  i  ty  h  o  I  es  are  d  i  sco  vered ,  a^,.\fV 

lim  1^4 1  ->  oo 

represents  vulnerability  and  isopen,  then  secure 

operation,  V,  is  closed.  Assume  that 

sup  ^-d{x,x0)<  oo  for  any  x0.  Note  that  x  is  now  an 

element  of  the  set  of  secure  operation.  In  other 
words,  the  number  of  secure  operations  is  bounded. 
Itiswell  known  that  a  set  is  compact  if  and  only  if  it 
isclosed  and  bounded.  Next,  Section  4takesa  closer 
look  at  a  model  for  I  nformation  Assurance  upon 
which  our  new  metric  is  based. 

2.3  AN  INFORMATION  ASSURANCE  MODEL 

In  order  to  develop  a  reference  and  working  model 
for  our  exploration  into  the  fundamentals  of  cyber¬ 
security  and  cyber-physics,  aTuring  M achine [  110] is 
used  to  characterize  system  operation.  The  Turing 
Machine  isoneof  the  most  fundamental  general 
computing  abstractions  and  iswell-known  in  com¬ 
puter  science.  It  hasa  rich  theoryof  itsown  that  this 
report  intends  to  utilize  to  its  advantage.  The  Turing 
M  achine  consistsof  a  seven-tuple  (Q,  T,I, 8,  b,  q0,  qjj . 
()  is  a  set  of  states,  Tisa  set  of  tape  symbols,  I  isaset 
of  input  symbols,  ^  is  a  blank,  is  the  initial  state,  qj 
isthe  final  state.  8  isthe  next  move  function.  8  maps 
a  subset  of  Qx  7*  to  Qx  (Tx  {L,  R,  Sf)k.  L,R,  and  S 
indicate  movement  of  the  tape  to  the  left,  right,  or 
stationary  respectively.  There  can  be  multiple  tapes. 
Thus8  implementsa  "next  move” function.  Given  a 
current  state  and  tape  symbol,  8  specifies  the  next 
state,  the  new  symbol  to  be  written  on  the  tape,  and 
the  direction  to  move  the  tape.  The  sets  of  symbols 
that  lead  to  an  accepting  state  (qj)  isthe  input  lan¬ 
guage  (£).  One  approach  to  the  study  of  security  is 
to  consider  the  Turing  Machine  representing  nor¬ 
mal  operation  of  an  information  system.  In  such  an 
approach,  if  the  Turing  Machine  recognizes,  or 
accepts,  an  input  language,  then  a  user  has  gained 
access  to  th  e  system  .IftheTuringMachine  accepts  a 
language  that  we  did  not  anticipate  (2|).  then  the 
system  isvulnerable,  as  stated  in  Hypothesis  4.1. 


Clearly,  the  Turing  Machine  isan  abstract  repre¬ 
sentation  of  any  protocol  implementation,  or  oper¬ 
ating  component  operation.  The  set  of 
unanticipated  in  put  languages  that  is  accepted  isthe 
vulnerability  of  the  component,  V,  as  shown  in  Defi¬ 
nition  4.1.  This  is  illustrated  in  an  existing  tool  GE 
Research  has  developed  [43]which  di  splays  system 
vulnerabilities  from  a  U  nix  operating  system  run¬ 
ning  Internet  Protocol  data  communications.  This 
work  requires  a  definition  of  security,  shown  in  Defi¬ 
nition  4.1.  Our  use  of  a  metric  space  requires  usto 
prove  Definition  3.1  holds. 

In  order  to  make  the  problem  of  quantifying 
assurance  tractable,  consider  the  fundamental  assur¬ 
ance  characteristics  of  an  individual  Turing 
Machine.  Assume  all  internal  operations  are  per- 
fectlysecure.  Thisfollowsfrom  [110]  in  which  Tur¬ 
ing  notes  that  machine  operations  are  atomic.1  It  is 
assumed  also  that  reading,  writing,  tape  control,  and 
state  control  cannot  be  observed,  modified,  or  inter¬ 
fered  with  in  any  manner.  The  only  effect  upon  the 
system  isthe  input  language.  A  maliciousinput  lan¬ 
guage,  given  a  single  Turing  Machine,  can  cause 
denial  of  service  simply  by  never  halting.  Also  it  is 
assumed  that  the  output  tape  is  accessible  to  the 
world.  Thus  a  maliciousinput  language  could  write 
secret  information  to  the  output  tape.  What  hap- 
penswith  multiple  users  each  with  their  own  regu¬ 
lated  data  access? 

Another  objective  of  an  attack  may  be  to  deter¬ 
mine  the  function  performed  by  the  Turing 
Machine.  In  thiscase  the  attacker  is  assumed  to  have 
the  ability  to  enter  every  member  of  the  input  lan¬ 
guage  in  order  to  deduce  operation  by  viewing  the 
output.  The  attacker  is  actually  deducing  8.  Deduc¬ 
ing  process  versus  data  is  described  from  a  complex¬ 
ity  viewpoint  in  more  detail  in  Section  6.3.  A  Trojan 
horse  is  implemented  either  by  allowing  an  attacker 
access  to  modify  another  user's  input  tape,  or  per- 
hapsS  of  the  machine  itself. 

Given  a  network  of  Turing  Machines,  in  which 
one  machine's  input  isthe  output  of  another 
machine,  an  input  language  could  be  self-replicat- 
ing,  that  is,  a  virus.  However,  Turing  Machines  are 
composable.  A  single  machine  with  additional  states 
can  implement  any  set  of  Turing  Machines.  This 
implies  that  a  state-based  approach  should  betaken 
in  the  analysis  of  assurance.  For  example,  is  state  qx 
more  or  less  secure  that  state  q  ?  H  ow  can  this  com- 


1.  In  [110],  Turing  casually  notes  the  symbols  on  the  machine'stape  can  form  a  conditionally  compact  space. 
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parison  be  made  within  a  single T uring  Machine?  Is 
the  state  on  a  path  that  leads  to  an  insecure  event? 
An  individual  state  by  itself  does  not  reveal  much 
about  the  current  assurance  level.  However,  a  set  of 
states,  considered  along  a  continuum,  provides 
much  more  information.  Can  a  gradient  be  estab¬ 
lished  that  leadsto  the  likelihood  of  an  insecure 
event?  Such  a  gradient  requires  a  well-defined  met¬ 
ric,  which  is  what  this  work  is  leading  towards.  One 
approach  towards  computing  vulnerability  would 
require  checking  every  possible  input  language  in 
order  to  quantify  relative  insecurity  levels.  This  is 
obviouslyan  intractable  approach.  This  problem  is 
tackled  bymeansof  complexityin  Section  6. 

In  order  to  begin  to  understand  how  information 
assurance  can  be  quantified,  consider  the  manner  in 
which  systems  that  implement  information  assur¬ 
ance  can  be  designed.  Design  involves  the  tradeoff 
of  one  benefit  for  another.  Brittle  Systems  provide  a 
framework  for  understanding  the  tradeoffs  in  per¬ 
formance  versusfailure  of  information  systems.  Brit¬ 
tle  systems  analysis  [  108]  is  based  on  the  idea  that 
systems  can  fail  in  a  manner  analogousto  brittle 
fracture.  A  system  can  maintain  very  high  perfor¬ 
mance  until  it  fails  quickly  and  catastrophically,  as 
illustrated  by  performance  curve  Ph  Figure  9  or  sys- 
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Figure  9.  Definition  of  Brittleness. 

temsmayfail  by  exhibiting  lower  performance  in  a 
gradual,  more  ductile  manner  as  in  curve  Pt.  The 
mapping  between  Brittle  Systems  Theory  and  infor¬ 
mation  assurance  isshown  in  Table  2.  This  analysis 
can  be  directly  applied  to  theTuring  Machine. 
Changes  in  any  of  the  state  machine  parameters,  Q 
T,  1, 5,  b,  q0,  qj,  may  modify  the  brittleness  of  the  sys¬ 
tem.  For  example,  addition  of  a  new  state  and  transi¬ 
tion  could  cause  the  system  to  behave  in  a  more 
ductile  or  brittle  manner.  What  isthe  measure  of 
performance  in  aTuring  Machine  model  of  infor¬ 
mation  assu  ran  ce?  W  h  at  d  oes  catastro  p  h  i  c  f  ai  I  u  re 
mean  in  aTuring  Machine  model  of  information 
assurance?  Performance  isa  from  Definition  5.2 


and  Xis  measured  related  to  attacker  effort.  The 
answers  to  the  above  questions  are  intimately  linked 
to  the  choice  of  metric.  Based  on  Definition  3.1  dis¬ 
cussed  previously,  one  could  choose  the  metric  to  be 
the  number  of  state/ transition  paths  available  to  an 
attacker  to  reach  a  particular  target  state,  or  equiva¬ 
lently,  the  cardinality  of  the  set  of  languages,  if  given 
in  Definition  3.1.  Another  possible  metric  could  be 
the  proximity  of  the  attacker's  current  state  to  the 
target  state.  Various  other  complex  metrics  could  be 
contrived  such  asonesthat  include  the  work 
involved  for  an  attacker  to  move  from  one  state  to 
another.  These  complexity  metrics  are  based  upon  a 
known  attacker  at  a  given  state.  If  a  single  measure¬ 
ment  is  required  to  descri  be  the  performance  of  the 
information  assurance  system,  then  values  generated 
by  the  choice  of  metric  must  be  combined  in  a  rea¬ 
sonable  manner.  However,  a  single  value  is  not 
meaningful  in  the  same  way  that  capacity  would  be 
useful  in  determining  load,  unless  there  were  a 
meaningful  attacker  strength  with  which  to  operate. 
Next,  in  Section  5,  more  detail  on  Brittle  Systems 
and  how  they  relate  to  a  new  information  assurance 
metric  are  discussed. 

2.4  BRITTLE  SYSTEM  S,  DETERM  INISTIC 
FINITE  AUTOMATA,  AND 
VULNERABILITIES 

A  Deterministic  Finite  Automaton  (DFA)  consistsof 
a  5-tuple  (S,  /,  5,  s^F)  where  S  isthe  set  of  states,  I  is 
the  input  alphabet,  5  isa  mapping  from  into  /,  s0is 
the  start  state,  and  F is  a  subset  of  S called  the  final, 
or  accepting  states.  A  DFA  is  less  powerful  than  a 
Turing  Machine  in  terms  of  the  languages  it  can  rec¬ 
ognize  as  well  as  less  capability  in  performance  of 
general  computation.  However,  DFA  have  been  well 
studied  and  facilitates  a  framework  in  which  new 
theories  related  to  Information  Assurance  can  be 
studied.  An  example  of  Brittle  Systems  using  Defini¬ 
tion  3.1  for  vulnerability  is  illustrated  for  the  DFA 
shown  in  Figure  10.  A  single  vulnerability  is  repre¬ 
sented  asa  single  modified  transition.  The  modified 
transition  represents  an  error  in  either  the  design  or 
implementation  that  allows  an  attacker  to  penetrate 
the  system.  The  effect  of  each  transition  modified 
from  its  original  source  node  to  each  possible  desti¬ 
nation  node  in  the  automaton  is  exhaustively 
checked.  The  effort  expended  by  an  attacker  is 
assumed  to  be  proportional  to  the  length  of  the 
stringsused  in  the  language.  P=a  (Definition  5.2) 
and  Xisthe  effort  of  an  attacker  measured  in  terms 
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Table  2  Brittle  System  Definitions 

Materials 

Science  Brittle  Systems 

Information  Assurance 

Stress 

Amount  parameter  exceeds  its 

tolerance 

Applied  force  underthe 
weight  of  an 

attack 

Toughness 

System  robustness 

Encryption  strength  and  sensitivity  of 
intrusion  detectors 

Ductility 

Level  of  Performance  out- 

Tolerance 

Ability  of  system  to  gracefully  degrade  given 

side 

an  attack 

Plastic  Strain 

Degradation  from  which  the 

system  cannot  recover 

Trojan  horse 

Brittle  Fracture 

Sudden  steep  decline  in 

performance 

Sudden  catastrophic  collapse  of  all 
information  assurance 

Young's  Modulus 

Amount  tolerance  exceeded 
over  degradation 

Deformation 

Degradation  in  performance 

The  amount  by  which  vulnerability  has 

been  increased  due  to  an  attack 

Brittleness 

Ratio  of  hardness  to  ductility 

Ductile  Fracture 

Graceful  degradation  in  performance 

Ability  of  information  to  gracefully  degrade 
under  an  attack 

Reversible  Strain 

Degradation  from  which  the  system 
can  recover 

Trojan  horse  detection  and  removal 

Hardness 

Level  of  performance  within  tolerance 
limits 

Resistance  to  decryption 

Figure  10.  Example  of  Deterministic  Finite  Au¬ 
tomation. 


of  language  size  required  to  reach  an  unintended 
accepting  state.  The  algorithm  requires  starting  with 


the  actual  system  as  represented  in  Figure  10,  modi¬ 
fying  a  transition  and  then  recording  the  number  of 
additional  strings  accepted .  T  h  i  s  i  s  repeated  for  each 
transition  in  the  base  system.  As  shown  in  Figure  11, 


Accepted  Strings  vs.  Language  Size 


Figure  11.  Ductile  Vulnerability. 


a  modification  of  the  transition  from  State  7, 

Input  3,  Destination  State 2,  (7,3,2)  yieldsa small 
number  of  vulnerabilities  at  string  length  two  with  a 
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maximum  of  1000  vulnerabilities  at  string  length  8. 
This  performance  is  ductile  compared  to  the  graph 
shown  in  Figure  12,  where  transition  (1,3,1)  is  modi- 


Accepted  Strings  vs.  Language  Size 


Figure  12.  Brittle  Vulnerability. 


fied.  Figure  12  shows  more  brittle  behavior  because 
it  takes  a  longer  string  length,  thus  more  effort  by 
the  attacker  to  find  vulnerabilities;  however,  the  vul¬ 
nerability  increases  rapidly  as  the  string  length 
increases.  A  more  precise  definition  of  Brittleness  is 
given  in  Definition  5.1.  The  discrete  form  of  brittle¬ 
ness  is  accomplished  by  normalizing  the  sum  of  the 
number  of  strings  accepted  of  systems  A  and  B  to  be 
the  same,  then  summing  the  value  where  B exceeds 
A  as  in  the  leftside  of  Figure  13. 


Performance 

Performance 

Brittle 

Brittle^ 

kPh 

Ductile 

HLe 

/ 

A 

J^a/b 

J 

m^p 

X  Input  ^ - 

- h  X  Input  *■ - 1 

Tolerance  (T) 


Tolerance  (T) 


Figure  13.  Definition  of  Brittleness. 

Building  upon  Definitions 3.1  and  3.2  requires 
thatTuring  Machine  states,  Q,  be  identified  as  either 
secure  or  insecure.  If  an  attacker  can  reach  a  mem¬ 
ber  of  qinsecure  then  the  attacker  is  considered  to 
have  performed  a  successful  attack.  If  an  attacker 
can  never  reach  a  member  of  qinsecure  then  the  sys¬ 
tem  isconsidered  invulnerable.  The  challenge  is 
that  neither  the  attacker  nor  the  defender  knows  the 
entire  structure  of  the  Turing  M  achine's  program 
because  the  attacker  is  unlikely  to  have  complete 
knowledge  of  the  defender's  system  and  because 
even  the  defender  may  not  fully  understand  the  sys¬ 
tem  that  wasdeveloped.  The  unknown  behavior  of  a 


system  is  discussed  later  in  terms  of  Apparent  Com¬ 
plexity  in  Section  6.5. 

Definition  5.3  provides  a  meansfor  easilycomput- 
ing  complexity  in  the  world  of  finite  automata.  Next 
the  relationship  between  brittleness  and  complexity 
isaddressed.  One  might  intuit  that  a  faulty  transition 
in  a  less  complex  automaton  will  have  less  of  an 
impact  than  a  faulty  transition  in  a  complex  version 
of  the  equivalent  automaton.  The  definition  of 
equivalent  automata  is  given  in  Definition  5.4. 

The  simplicity,  intended  to  be  the  opposite  of 
complexity,  isgiven  in  Definition  5.5  as  the  differ¬ 
ence  in  size  between  the  current  implementation  of 
an  automaton  and  its  minimized  size. 

A  simple  implementation  of  an  automaton  has 
more  transitions  and  states  than  necessary  to  imple¬ 
ment  the  automaton.  Thus,  there  is  more  opportu¬ 
nity  for  an  attacker  to  find  a  weak  point  in  the 
system.  FI  owever,  once  an  attacker  breaks  into  a  sim¬ 
ple  system,  there  will  be,  on  average,  more  energy, 
that  is,  longer  string  length,  required  to  reach  the 
attacker'sdestination.  Thus,  greater  simplicity 
should  imply  reduced  brittleness  ( FH  ypothesis  5.1) . 

Figure  14  and  Figure  15  show  a  simple  and  com¬ 


plex  implementation,  respectively,  of  the  same  arbi¬ 
trary  information  system.  Figure  14,  as  a  simple 
implementation,  iswhat  might  be  intuitively 
referred  to  as  an  inefficient  implementation,  with 
many  more  states  than  necessary.  This  yields  the 
opportunity  for  more  vulnerabilities  and  faults. 

FI  owever,  it  also  takes  the  attacker  more  effort  to 


22 


2.4  Brittle  systems,  deterministic  finite  automata,  and  vulnerabilities 


Figure  15.  Complex  Version  of  the  DFA  Show  n 
in  Figure  14. 

reach  a  given  target.  Figure  15  is  a  closer  representa¬ 
tion  of thetruecomplexityof thesame system.  Ithas 
fewer  opportunitiesfor  failure;  however,  the  failures 
that  occur  will  be  more  significant. 

I  n  Figure  16  and  Figure  17,  brittleness  and  com- 

Brittleness  vs.  Fault  Location 


Figure  16.  Brittle  Measure  of  DFA  Shown  in  Figure  14. 

plexityare  compared.  Brittleness  is  computed  as 
defined  in  Definition  5.1.  Performance  isdefined 
based  upon  the  number  of  accepted  strings  and  lan¬ 
guage  size.  The  ratio  of  the  number  of  accepted 
strings  to  total  language  size  is  inversely  propor¬ 
tional  to  the  performance.  For  each  possible  fault, 
this  ratio  is  compared  to  a  consistent  base  case  con¬ 
sisting  of  an  exponentially  growing  number  of 
accepted  words  as  the  language  size  increases.  A  brit¬ 
tle  system  accepts  few  words  initially,  then  suddenly 
accepts  a  large  number,  while  a  ductile  system 
accepts  a  moderate,  but  gradually  increasing  num- 


Density  vs.  Fault  Location 


Figure  17.  Complexity  of  DFA  Shown  in  Figure  14. 

ber  with  no  sudden  increase.  The  brittle  measure  is 
graphed  asa  function  of  a  fault  in  the  state  specified 
on  the  dependent  axis.  A  fault  isthe  disappearance 
of  a  state  that  results  in  the  direct  connection  of  a 
transition  to  the  destination  nodes  of  the  faulty 
node.  Complexity  is  estimated  as  the  number  of 
transitions  in  the  smallest  representation  of  the 
resulting  faulty  system.  Comparing  Figure  16  and 
Figure  17,  there  appears  to  be  an  opposite  relation¬ 
ship  between  brittleness  and  complexity.  That  is,  a 
system  with  greater  complexity  results  in  lower  brit¬ 
tleness.  Greater  complexity  indicates  a  larger  num¬ 
ber  of  transitions  and  states  exist,  thusthere  ismore 
opportunityfor  an  attack,  but  more  effort  is 
required  by  the  attacker  to  successfully  complete  the 
attack.  In  Figure  18  and  Figure  19  a  similar  analysis 

Brittleness  vs.  Fault  Location 


Figure  18.  Brittle  Measure  of  System  Shown  in  Figure  14. 


is  performed  on  the  more  compact,  or  truer  repre¬ 
sentation  of  the  complexity,  of  the  same  system. 
Notice  that  the  system  with  an  implementation  that 
is  closer  to  its  true  complexity  is  much  more  brittle. 
Also,  note  that  the  inverse  relationship  between 
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Complexity  vs.  Fault  Location 


12  3  4 

Figure  19.  Complexity  Measure  of  System  Shown  in 
Figure  14. 


complexity  and  brittleness  holds  in  the  more  com¬ 
plex  system  as  well. 

An  important  result  in  this  exploration  of  the 
relationship  among  vulnerability,  complexity,  and 
brittleness  is  that  the  larger  the  system,  in  terms  of 
the  number  of  transitions  and  states,  the  lower  its 
brittleness.  This  suggests  that  larger  systems,  requir¬ 
ing  traversal  of  larger  numbers  of  states  and  transi¬ 
tions  to  reach  an  accepting  state,  or  attack  target, 
require  more  effort  to  successfully  attack.  A  system 
that  has  a  large  amount  of  inherent  complexity  can¬ 
not  be  made  any  more  compact  than  its  Kol  mogorov 
Complexity,  which  isdiscussed  later.  An  intelligent 
attacker  maybe  able  to  observe  an  inefficiently 
implemented  system  and  reduce  it  to  its  most  com¬ 
pact  form,  that  is,  its  Kolmogorov  Complexity,  thus 
easily  identifying  pathsof  attack  to  reach  specific  tar¬ 
gets.  A  truly  safe  system  isthusobtained,  not  by 
building  inefficiency  in  the  system,  but  rather,  by 
making  the  view  to  the  attacker  as  inherently  com¬ 
plex  as  possible. 

2.5  KOLMOGOROV  COMPLEXITY 

Information  must  be  accessible  to  legitimate 
users  while  access  is  denied  to  potential  attackers. 
This  is  done  by  increasing  the  apparent  complexity 
of  access  to  information  and  by  providing  legitimate 
users  with  enough  a  priori  knowledge  to  reduce  the 
apparent  complexity.  This  leads  one  to  conclude 
that  complexity  itself  would  be  a  useful  metric  for 
I  nformation  Assurance.  H  owever,  the  search  for  an 
absolute  measure  of  complexity  is  a  problem  that 
maybe  equally  as  hard  as  quantifying  Information 
Assurance  itself.  There  isa  good  reason  for  this;  they 
are,  in  a  sense,  one  and  thesame  thing.  Toshowthis, 


begin  with  the  defi nition  of  complexity.  Kolmogorov 
complexity  is  a  measure  of  descriptive  complexity 
that  refers  to  the  minimum  length  of  a  program 
such  that  a  universal  computer  can  generate  a  spe¬ 
cific  sequence.  Kolmogorov  complexity  is  described 
in  Definition  6.1,  where  cp  represents  a  universal 
computer,  p  represents  a  program,  and  x  represents 
a  string.  Universal  computers  can  be  equated 
through  programs  of  constant  length;  thus  a  map¬ 
ping  can  be  made  between  universal  computers  of 
different  types. 

Kolmogorov  Complexity  is  proposed  as  a  funda¬ 
mental  property  of  information  that  has  properties 
of  conservation  that  may  be  exploited  to  provide 
information  assurance.  In  this  report,  Kolmogorov 
Complexity  is  reviewed  and  current  work  in  thisarea 
explored  for  possible  applications  in  providing 
information  assurance.  The  concept  of  M  inimum 
Message  Length  isexplored  and  applied  to  informa¬ 
tion  assurance,  yielding  examples  of  possible  bene¬ 
fits  for  system  optimization  as  well  as  security  that 
can  be  achieved  through  the  use  of  Kolmogorov 
Complexity  based  ideas.  Finally,  complexity  based 
vulnerability  analysis  is  demonstrated  through  simu¬ 
lation  in  Section  9. 

Currently  information  security  is  achieved 
through  the  use  of  multiple  techniques  to  prevent 
unauthorized  use.  Encryption,  authentication/  pass¬ 
word  protection,  and  policies  all  provide  some  level 
of  security  against  unauthorized  use.  But  other  than 
simply  relying  on  these  secure  barriers,  how  does 
one  measure  the  health  of  a  security  system.  If  a 
password  or  encryption  key  is  compromised,  what 
indication  will  be  available?The  degree  to  which  a 
system  iscompromised  is  difficult  to  ascertain.  For 
example,  if  one  password  has  been  guessed,  or  two 
encryption  keys  determined,  how  secure  isthe  infor¬ 
mation  system?  Are  all  detectable  security  issues 
equal,  or  are  some  more  important  than  others? 
These  difficulties  reflect  the  fact  that  there  is  no 
objective,  fundamental  set  of  parameters  that  can  be 
evaluated  to  determine  if  security  is  maintained. 
Insecurity  may  not  be  detected  until  an  absurd  result 
(rat  in  a  tank)  discloses  the  presence  of  an  attacker. 
An  inherent  propertyof  information  itself  is  desired 
that  can  be  monitored  to  ensure  the  security  of  an 
information  system.  The  descriptive  complexity  of 
the  information  itself  -  the  KolmogorovComplexity 
-isa  strong  candidate  for  this  purpose. 


24 


2.5  Kolmogorov  Complexity 


Complexity  and  vulnerability  in  information 
assurance 

Progress  in  information  assurance  cannot  proceed 
without  fundamental  measures.  Measurement 
requires  that  information  assurance  be  identified 
and  quantified.  In  order  to  make  progress  towards 
thisgoal  the  results  of  a  study  in  the  evolution  of  the 
complexityof  information  are  presented.  An  under¬ 
lying  definition  of  information  security  is  hypothe¬ 
sized  based  upon  attacker  and  defender  as 
reasoning  entities,  capable  of  innovation.  This  leads 
to  a  study  of  the  evolution  of  complexity  in  an  infor¬ 
mation  system  and  the  effectsof  the  environment 
upon  the  evolution  of  information  complexity. 

U  nderstanding  the  evolution  of  complexity  in  a  sys¬ 
tem  enablesa  better  understanding  of  whereto  mea¬ 
sure  and  howto  quantify  vulnerability  and  should 
lead  towards  a  calculus  of  information  system  com¬ 
plexity.  Finally,  the  design  of  the  tool  under  con¬ 
struction  for  automated  measurement  of 
information  assurance,  used  to  gather  and  analyze 
the  complexity  data  in  this  report,  is  presented.  The 
motivation  for  complexity-based  vulnerability  analy¬ 
sis  comes  from  the  fact  that  complexity  is  a  funda¬ 
mental  property  of  information.  If  the  interaction  of 
information  complexity  with  its  environment  can  be 
understood,  then  a  new  understanding  of  informa¬ 
tion  assurance  may  be  possible,  one  in  which  assur¬ 
ance  can  be  better  understood  and  measured. 
Quantification  is  necessary  because  tools  have  been 
developed  to  measure  and  analyze  security  assuming 
that  rigorouslydefined  security  metrics  exist.  Unfor¬ 
tunately,  such  metricsdo  not  yet  exist. 

One  method  for  examining  information  assur¬ 
ance  isto  consider  its  converse,  insecurity  and  vul¬ 
nerability.  Vulnerability  analysis  tools  today  require 
typesof  vulnerabilities  to  be  known  apriori.  Thisis 
unacceptable,  but  understandable  given  the  chal¬ 
lenge  of  finding  all  potential  vulnerabilities  in  a  sys¬ 
tem.  Information  assurance  is  a  hard  problem  in 
part  because  it  involves  the  application  of  the  scien¬ 
tific  method  by  a  defender  to  determine  a  means  of 
evaluating  and  thwarting  the  scientific  method 
applied  by  an  attacker.  This  self-reference  of  scien¬ 
tific  methodswould  seem  to  implya  non-halting 
cycle  of  hypothesis  and  experimental  validation 
being  applied  by  both  offensive  and  defensive  enti¬ 
ties.  Information  assurance  dependsupon  the  ability 
to  discover  the  relationshipsgoverning  thiscycle 
and  then  quantify  and  measure  the  progress  made 


by  both  an  attacker  and  defender.  This  work 
attempts  to  lay  the  foundation  for  quantifying  infor¬ 
mation  assurance  in  such  an  environment  of  escalat¬ 
ing  knowledge  and  innovation. 

Any  vulnerability  analysistechnique  for  informa¬ 
tion  assurance  must  accountfor  the  innovation  of  an 
attacker.  Such  a  metric  was  suggested  about  700 
years  ago  by  William  of  Occam  [94].  Occam'sRazor 
has  been  the  basisof  much  of  this  invention  and  the 
complexity-based  vulnerability  method  to  be  pre¬ 
sented.  The  salient  point  of  Occam's  Razor  and 
complexity-based  vulnerability  analysis  isthat  the 
better  one  understands  a  phenomenon,  the  more 
concisely  the  phenomenon  can  be  described.  Thisis 
the  essence  of  the  goal  of  science:  to  develop  theo¬ 
ries  that  require  a  minimal  amount  of  information. 
Ideally,  all  the  knowledge  required  to  describe  a 
phenomenon  can  be  algorithmicallycontained  in 
formulae,  and  formulae  that  are  larger  than  neces¬ 
sary  indicate  lack  of  a  full  understanding  of  a  phe¬ 
nomenon.  The  working  hypothesis  in  this  report  is 
that  vulnerabilities  are  locations  of  low  complexity. 

Next  consider  the  attacker  as  a  scientist  trying  to 
learn  more  about  his  environment.  In  this  case,  the 
environment  isan  Information  System.  The  attacker 
as  scientist  will  generate  hypotheses  and  theorems. 

T  h  eo  rems  are  attem  pts  to  i  n  crease  u  n  d  erstan  d  i  n  g  of 
the  universe  by  assigning  a  cause  to  an  event,  rather 
than  assuming  all  events  are  random.  From  [94]and 
Definition  5.1  above,  if  ^  is  of  length  l(x),  then  a  the¬ 
orem  of  length  l(m),  where  l(m)  is  much  less  than 
l(x),  is  not  only  much  more  compact,  but  also  2l(x)- 
l(m)  times  more  likely  to  be  the  actual  cause  than 
pure  chance.  Thus,  the  lower  the  complexityof  the 
theorem,  as  stated  by  Occam's  Razor,  the  more  likely 
the  theorem  isto  be  correct. 

Consider  Figure  1  and  Figure  2and  notice  how 
one  intuitively  vi  ews  current  as  an  attacker's  stren  gth 
and  voltage  differential  as  the  desire  or  pressure  of 
an  attack  upon  a  system.  Resistance  is  intuitively 
viewed  as  the  ability  of  thesystem  to  block  an  attack. 
Thus,  following  this  intuition,  one  can  view  vulnera¬ 
bility  as  inversely  proportional  to  resistance  and 
directly  proportional  to  the  complexity  of  thesystem 
as  viewed  by  an  attacker.  Capacitance  and  induc¬ 
tance  might  be  intuitively  viewed  as  relating  to  the 
brittleness  of  the  system.  Brittle  Systems  analysis  is 
discussed  later;  however,  it  relates  to  the  tradeoff  in 
performance  and  degradation  of  a  system.  This  intu¬ 
ition  is  captured  in  Hypothesis  6.1. 
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This  hypothesis  is  significant  because  it  sets  the 
direction  of  our  efforts  and  determines  the  funda¬ 
mental  basis  upon  which  the  remaining  work  rests. 
As  the  attacker  is  refining  the  theorems  (that  is, 
reducing  their  complexity),  the  defender  is  attempt¬ 
ing  to  raise  their  complexity,  while  still  maintaining 
(low  complexity)  access  for  legitimate  users.  Thus 
there  is  an  ever-increasing  cycle  of  complexity  (see 
Figure  20).  Itisas though  while  a  scientist  studies 


Time 

Figure  20.  Evolution  of  Complexity  Caused  by  Attack  and  De¬ 
fense. 


natural  phenomenon,  nature  actively  tries  to  hide  its 
true  internal  operation  through  the  addition  of 
more  complexity. 

KolmogorovComplexity  is  a  measure  of  descrip¬ 
tive  complexity  contained  in  an  object.  It  refers  to 
the  minimum  length  of  a  program  such  that  a  uni¬ 
versal  computer  can  generate  a  specific  sequence.  A 
good  introduction  to  KolmogorovComplexity  is 
contained  in  [118]  with  a  solid  treatment  in  [97], 
KolmogorovComplexity  is  related  to  Shannon 
entropy,  in  that  the  expected  value  of  K{x)  for  a  ran¬ 
dom  sequence  is  approximately  the  entropy  of  the 
source  distribution  for  the  process  generating  the 
sequence [118],  However,  KolmogorovComplexity 
differs  from  entropy  in  that  it  relates  to  the  specific 
string  being  considered  rather  than  the  source  dis¬ 
tribution.  KolmogorovComplexity  can  be  described 
as  follows: 

Random  strings  have  rather  high  Kolmogorov 
Complexity  -  on  the  order  of  their  length  -  as  pat¬ 
terns  cannot  be  discerned  to  reduce  the  size  of  a 
program  generating  such  a  string.  On  the  other 
hand,  strings  with  a  large  amount  of  structure  have 
fairly  low  complexity.  Universal  computers  can  be 
equated  through  programs  of  constant  length;  thus 
a  mapping  can  be  made  between  universal  comput¬ 
ers  of  different  types,  and  the  KolmogorovComplex¬ 


ity  of  a  given  string  on  two  computers  differs  by 
known  or  determinable  constants.  The  Kolmogorov 
Complexity  K{y\x)  of  a  string 3) given  string  xasinput 
isdescribed  by  Defi nition  6.2:  where  l(p)  represents 
program  length  p  and  cp  is  a  particular  universal 
computer  under  consideration.  Thus,  knowledge  or 
input  of  a  string  x  may  reduce  the  complexity  or  pro¬ 
gram  size  necessary  to  produce  a  new  string)). 

The  major  difficulty  with  KolmogorovComplexity 
isthat  it  cannot  be  computed.  Any  program  that 
produces  a  given  string  is  an  upper  bound  on  the 
KolmogorovComplexityfor  this  string,  but  the 
lower  bound  [97]  cannot  be  computed.  A  best  esti¬ 
mate  of  KolmogorovComplexity  may  be  useful  in 
determining  and  providing  Information  Assurance 
due  to  links  between  Kolmogorov  Complexity  and 
information  security  that  will  be  discussed  later.  Vari¬ 
ous  estimates  have  been  considered,  including  com¬ 
pressibility,  or  pseudo-randomness,  which  measure 
the  degree  to  which  strings  have  patterns  or  struc¬ 
ture.  A  new  metric  that  is  related  to  the  power  spec¬ 
tral  density  of  the  sequence  auto-correlation  is 
introduced  in  a  later  section.  However,  all  metrics 
are  at  best  crude  estimates.  The  inability  to  compute 
KolmogorovComplexity  persists  as  the  major  imped¬ 
iment  to  widespread  utilization. 

Despite  the  problems  with  measurement,  Kol¬ 
mogorovComplexity  and  information  assurance  are 
related  in  many  ways.  Cryptography,  for  example 
attempts  to  take  strings  that  have  structure  and 
make  them  appear  randomly.  The  quality  of  a  cryp¬ 
tographic  system  is  related  to  the  system's  ability  to 
raise  the  apparent  complexity  of  the  string,  an  idea 
discussed  in  detail  later,  while  keeping  the  actual 
complexity  of  the  string  relativelythe  same  (within 
the  bounds  of  the  encryption  algorithm).  In  other 
words,  cryptography  achieves  its  purpose  by  making 
a  string  appear  to  have  a  high  KolmogorovCom¬ 
plexity  through  the  use  of  a  difficult  or  impossibleto 
guess  algorithm  or  key.  Security  vulnerabilities  may 
also  be  analyzed  from  the  viewpoint  of  Kolmogorov 
Complexity.  One  can  even  relate  insecurity  funda¬ 
mentally  to  the  incomputabilityof  Kolmogorov 
Complexity  and  show  why  security  vulnerabilities 
exist  in  a  network.  Vulnerabilities  can  be  thought  of 
as  the  identification  of  methods  to  accomplish  tasks 
on  an  information  system  that  are  easier  than 
intended  bythe  system  designer.  Essentially  the 
designer  intendsfor  something  to  be  hard  for  an 
unauthorized  user  and  the  attacker  identifies  an  eas¬ 
ier  way  of  accomplishing  thistask.  Measuring  and 
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keeping  track  of  a  metric  for  KolmogorovComplex- 
ityin  an  information  system  provides  a  method  to 
detect  such  short-circuiting  of  the  intended  process. 
Note  that  the  definition  of  complexity  rests  upon 
the  notion  of  theTuring  Machine,  which  is  also  the 
basisof  our  information  assurance  model.  One 
direction  in  this  research  isto  combine  the  defini¬ 
tion  of  complexity  with  the  definition  of  vulnerabil¬ 
ity  in  a  fundamental  manner.  This  has  been  done 
using  Brittle  Systems  and  Hypothesis  6.1.  Next,  Sec¬ 
tion  6.2  examines  estimates  of  Kolmogorov  Com¬ 
plexity. 

M  easures  of  information  complexity 

This  section  looks  at  proposed  estimates  of  Kolmog¬ 
orov  Complexity.  The  microscopic  approach  to  the 
study  of  information  complexity  evolution  begins  by 
considering  the  change  in  complexity  of  a  single 
interaction.  Later  thisreportexpandsthe  results  to  a 
larger  scale  and  discusses  the  results.  However,  in 
order  to  make  the  problem  of  estimating  complexity 
tractable,  two  approaches  are  used.  The  first 
approach  is  based  upon  Finite  Automata.  A  Finite 
Automata  whose  smallest  accepted  string  isthe  bit¬ 
string  whose  complexity  isto  be  determined,  shown 
in  Definition  6.3,  is  minimized  using  techniques 
such  as  [79,80]  where  L(FA)  isthe  set  of  languages 
accepted  by  the  automaton  FA  and  l(FA)  is  its  size. 

Figure  21  illustrates  the  uncompressed  represen- 

Finite  Automaton  Representation  of  1012 


Figure  21.  Finite  Automation  Representation  of  1012.. 

tation  of  an  arbitrary  bit-string,  1012.  This  automa¬ 
ton  accepts  many  other  bit-strings  in  addition  to 
1012;  however,  the  size  of  the  minimized  automaton 
that  accepts  that  bit-string,  based  upon  the  number 


of  transitions,  for  example,  is  an  estimate  of  the 
complexity  of  1012.  Thus,  minimizing  1012  also  min- 
imizesother  bit-strings  such  as  those  formed  by  the 
regular  expression  0*l't0+l2  where  the*  indicates 
zero  or  more  of  the  preceding  symbol  and  +  indi¬ 
cates  one  or  more  of  the  preceding  symbol.  How¬ 
ever,  1012  isthe  smallest  string  accepted  by  the 
automaton. 

The  M athematica  function  UnionAutomata 
returns  an  Automaton  that  isthe  union  of  two 
automata.  Notice  that  the  complexity  of  the  union 
of  two  automata  is  less  than  the  sum  of  thecomplex- 
ityofeach  automaton  as  shown  in  Figure  22, 
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Bit  String  (Base  10) 

Figure  22.  Complexity  of  Union  of  Automata  versus  Sum  of 
Complexities. 

graphed  asa  function  of  bit-strings  representing  the 
base  ten  integers  from  lto  50.  This  validates  a 
known  theorem  from  complexity  theory  discussed 
next,  namely,  that  the  resulting  bit-string  complexity 
from  a  program  is  less  than  or  equal  to  the  sum  of 
the  complexity  of  the  input  bit-strings,  the  length  of 
the  program,  and  a  constant.  The  complexity  of  the 
combined  bit-strings  should  be  less  than  the  sum  of 
the  automata  complexities  plus  a  constant  that  is 
dependent  upon  the  size  of  the  Universal  Turing 
Machine,  sometimes  expressed  as  H(XY)  <H[X)  + 
H(Y)  +cwhere/7()  isthe  size  of  the  smallest  pro¬ 
gram  capable  of  computing  a  specified  result. 
Another  way  in  which  to  view  individual  or  micro¬ 
scopic  interaction  isto  consider  that  if  a  program  p 
of  length  L(p)  takes  input  string  x  to  produce  output 
string^,  that  is,  y=p( x),  then  Definition  6.4  defines 
the  microscopic  change  in  complexity  where  Xisthe 
KolmogorovComplexityand  cdependsupon  the 
underlying  Universal  Turing  Machine  [137], 
Another  approach  for  estimating  complexityused 
in  this  report  is  based  upon  compression.  The 


27 


2.  Discussion 


inverse  compression  ratio,  the  ratio  of  the  com¬ 
pressed  size  to  the  original  length,  is  used  as  an  esti¬ 
mate  of  the  complexity.  A  highly  complex  bit-string 
cannot  be  compressed  as  much  as  a  low  complexity 
bit-string.  A  plot  of  automata-based  complexity  ver¬ 
sus  comp  ressi  on -based  complexity  is  shown  in 
Figure  23.  The  compression  formulae  used  is  where 


j= 

J 

:  J 

jj 

ifci 

■d 

_ 

i  k  aLa 

m1 

iL* 

11 

*  i  i 

1 

i 

r 

*  Finite  Automata 

■  —  ■  —  -  Compression 

.  i  i 

:  1 

0  20  40  60  80  100 


Bit  String  (Base  10) 

Figure  23.  Automata  and  Compression-Based  Measures  of 
Complexity. 


H()  isthe  entropy,  w  isthe  bit  string  length,  w  isthe 
number  of  one  bits,  and  c  i s  th e  si ze  in  bits  needed 
represent  the  length  of  the  bit-string.  Clearly,  the 
compression-based  mechanism  provides  a  more 
accurate  measure  of  complexity  in  the  Kolmogorov 
sense;  however,  the  automata-based  mechanism  has 
advantages  in  that  automata  provide  a  simplified 
and  convenient  mechanism  for  reasoning  about 
computation  at  the  microscopic  level. 

As  previously  discussed,  due  to  itsnon-comput- 
able  nature,  esti mates  of  K{x)  are  difficult.  Numer- 
oustechniques for  estimating  K{x)  are  discussed  in 
[97],  The  task  of  estimating  ^is  related  to  the  task 
of  assessing  string  structure.  We  now  introduce  a 
new  primitive  approach  to  this  related  issue  based 
on  the  power  spectral  density  of  a  string’s  auto-cor- 
relation.  Thisapproach  highlights  the  ability  to  gain 
knowledge  of  K{x)  without  any  higher  knowledge 
about  the  system  producing  string  x  or  the  meaning 
of  the  information. 

Recognizing  that  the  complexity  of  a  binary 
string  maybe  defined  in  many  ways.  A  useful  com¬ 
plexity  measure  may  be  related  to  properties  of  the 
string's non-cyclic  auto-correlation.  Specifically, 
given  an  w-bit  binary  string,  S,  where 


S  =  {s(z)}>  0  <i  <n 

(l) 

s(i)  E  {±1 } Vz 

(2) 

define  the  non-cyclic  auto-correlation,  R,  as 

R  =  {r(i)},0zi<n  (3) 

where 

n-i  -  1 

r(i)  =  ^  s(J)s(i+j)  (4) 

j=  o 

From  R,  calculate  the  sequence's  non-negative 
power  spectral  density,  <f>|,  by  multiplying  the  Fou¬ 
rier  transform  of  it  by  its  conjugate.  The  measure  for 
binarystringcomplexitythatisformed  isdenoted  by 

and  is  defined  as 

w  =  — L— (5) 

nrm.  factor 

i 

The  motivation  to  thisapproach  isfound  in  the 
rich  and  venerable  field  of  synchronization 
sequence  design.  Sequences  that  have  an  autocorre¬ 
lation  whose  side-lobes  are  of  very  low  magnitude 
provide  good  defense  against  ambiguity  in  time 
localization.  Such  an  autocorrelation  function  will 
approximate  a  "thumbtack"  and  its  Fourier  trans¬ 
form  will  approximate  that  of  band-limited  white 
noise. 

T  h e  au th  o rs  of  th  i  s  repo rt  expect  th  at  'P  wi  1 1  be  of 
utility  in  assessing  complexity  as  it  relates  to  the  com¬ 
pressibility  of  a  binary  string.  To  begin  the  testing  of 
this  hypothesis,  we  generated  strings  from  the 
Markov  process  diagrammed  in  Figure  24. 


p 


A  series  of  binary  sequences  of  8000  bits  was  gen¬ 
erated;  each  for  different  values  of  p.  W  was  com¬ 
puted  for  each  of  these  strings  and  also  packed  into 
1000-kilobyte  files.  These  were  subjected  to  the 
UNIX  compress  routine.  The  Inverse  Compression 
Ratio  (ICR)  was  computed  which  isthe  size  of  the 
compressed  file  normalized  to  its  uncompressed 
size,  1000  kilobytes  in  these  cases.  The  hypothesis  is 
that'P  and  the  ICR  should  vary  in  a  similar  manner 
and  that  *P  might  be  a  useful  measure  of  sequence 
compressibility  and  hence  complexity.  The  graph  in 
Figure  25  following  seems  to  endorse  this  hypothesis 
and  further  research  is  motivated. 

The  above  results  show  that  fundamental  parame¬ 
ters  such  as  power  spectral  density  of  sequence  auto- 
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Figure  25.  Variation  of  Psi  and  ICR  with  p. 

cor  relation  and  compressibility  are  related  and  fol¬ 
low  simi  lar  trends.  These  fundamental  metrics  are 
possible  candidates  for  measuring  trend  of  increase 
or  decrease  in  K{x).  However,  also  illustrated  by 
these  results  (the  unequal  rate  of  change  between 
the  two  metrics)  are  the  loose  bounds  within  which 
estimates  of  K{x)  are  related.  Other  methods  of  esti¬ 
mating  K{x)  are  described  in  [97],  In  the  next  sec¬ 
tion  we  introduce  a  method  for  attacking  the  issue 
of  loose  boundsin  order  to  make  complexity  metrics 
useful  for  the  purposes  of  assessing  and  providing 
information  assurance. 

Since  it  is  not  computable,  few  applications  exist 
for  Kolmogorov  Complexity.  One  growing  applica¬ 
tion  is  a  statistical  technique  with  strong  links  to 
information  theory  known  as  Minimum  Message 
Length  (MML)  coding  [  143] .  MML  coding  encodes 
information  asa  hypothesis  that  identifies  the  pre¬ 
sumptive  distribution,  from  which  data  originated, 
appended  with  a  string  of  data,  coded  in  an  optimal 
way.  The  length  of  an  MML  message  is  determined 
as  follows:  #M  =  #H  +  #D,  where  #Misthe  message 
length,  #H  is  the  length  of  the  specification  of  the 
hypothesis  regarding  the  data,  and  #D  is  the  length 
of  the  data,  encoded  in  an  optimal  manner  given 
hypothesis//.  As  discussed  in  [143],  MML  coding 
approaches  the  KolmogorovComplexityor  actual 
bound  on  the  minimum  length  required  for  repre¬ 
senting  a  string  of  data. 

Process  vs.  data  complexity 

Complexity  can  be  applied  to  the  problem  of  infor¬ 
mation  assurance  in  two  ways.  As  discussed  above, 
conservation  of  apparent  complexity  may  enable 
detecting  and  correcting  abnormal  behavior. 
Another  method  of  using  apparent  complexity  for 


information  assurance  is  in  the  identification  of 
weak  areas  or  vulnerabilities  in  the  system.  Consider 
the  postulate  that  the  more  apparently  complex  the 
data,  the  more  difficult  for  an  attacker  to  under¬ 
stand  the  data  and  exploit  the  system.  Thus,  the 
more  apparentlycomplex,  the  less  vulnerable  it  is 
and  vice  versa.  One  proposed  metric  for  vulnerabil¬ 
ity  relates  to  evaluating  the  apparent  complexity  of 
the  concatenated  input  and  output  K{X.Y).  This 
relates  to  the  joint  complexity  of  the  data  input  and 
output  from  a  certain  process  (Black  Box).  The 
lower  the  complexity  of  K{  X.Y)  the  easier  the  data  is 
for  an  attacker  to  understand;  thus  we  will  regard 
K{X.Y)  asa  measure  of  data  vulnerability.  A  compet¬ 
ing  metric  isthe  relative  complexity/^  Y|x)  of  the 
process.  This  isthe  "work"  done  on  the  information 
by  the  process,  or  the  complexity  added  or  removed 
from  Xto  produce  Y.  Thus,  K{  Y\X)  is  a  measure  of 
process  vulnerability.  The  relationship  among  this 
set  of  complexity  metrics  and  the  black  box  process 
is  shown  in  Figure  26. 


Figure  26.  Process  versus  Data  Vulnerabilities. 


Data  vulnerability  relates  to  how  vulnerable  a  sys¬ 
tem  isto  an  attacker  knowing  information.  Thistype 
is  perhaps  best  measured  by  K{X.Y),  where  the 
cumulative  complexity  of  input  and  output  data  is 
observed  to  measure  the  difficulty  an  attacker  would 
face  in  decrypting  or  identifying  messages  contained 
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in  input  and  output.  For  example,  hopefully 
£(  encrypted  message)  appearsmuch  greater  than 
£(  decrypted  message)  to  the  casual  observer  and  is 
only  recognized  to  be  on  the  order  of  /^decrypted 
message)  to  an  authorized  user  with  the  correct  key 
after  the  decryption  algorithm  has  been  run. 

Process  vu I nerabi I ity  relates  a  system's  suscepti bi I- 
ityto  an  attacker  understanding  the  processes  that 
manipulate  information.  Thisvulnerabilityis  best 
quantified  by  the  complexity  injected  or  removed 
from  the  data  by  the  process  at  work.  For  example,  a 
copy  or  passthrough  process  adds  little  complexity, 
K{  F|X)  is  zero.  But  if  encrypted  data  is  sent  through 
the  copy  process,  K{X.Y)  will  be  high.  The  attacker 
will  be  unable  to  discern  the  messages  that  are  sent, 
but  can  learn  to  perhapssimulate  this  particular 
black  box  quite  effectively.  Whereas  if  plain  text  data 
issentthrough  the  copy  process  K{X.Y)  will  below, 
and  in  addition  to  understanding  the  process  at 
work,  and  attacker  maybe  able  to  know  the  particu¬ 
lar  messages  that  are  sent.  Both  vulnerabilities  are 
undesirable  and  represent  two  different  dimensions 
of  vulnerability  to  be  avoided.  To  make  systems 
secure  one  must  maximize  both  process  and  data 
complexity  to  a  n  on-authorized  user  while  keeping 
the  systems  simple  to  authorized  users.  Proper 
accounting  of  K{  F|X)  and  K{X.Y)  throughout  the 
system  will  enable  both  identification  of  weak  areas 
as  well  as  identification  of  foul  play  through  the  con¬ 
servation  principles  discussed  earlier. 

Vulnerability  reduction  by  means  of  system 
optimization 

In  this  section  we  discuss  issues  related  to  system 
optimization  that  can  be  achieved  through  Kolmog¬ 
orov  complexity  and  various  tradeoffs.  Compression 
and  security  are  strongly  linked  in  that  they  are 
bounded  optimally  by  the  most  random  sequence 
that  can  be  produced.  But  smallest  program  size  is 
not  the  only  or  even  most  important  performance 
metric.  Execution  time  must  also  be  considered. 

The  tradeoff  is  indicated  in  Figure  27. 

Throughtheuseofactiven  etwo  r  k  tech  n  i  q  u  es  [  3] 
th e trad eoff  i n d i cated  maybedynamically add ressed 
using  a  concept  called  Active  Packet  Morphing  for 
network  optimization.  As  shown  in  Figure  28,  by 
changing  the  form  of  information  from  data  to  code 
as  information  flows  through,  a  system  can  optimize 
CPU  resources,  and  bandwidth  resources.  This  idea 
can  be  extended  to  optimize  or  prevent  adverse 
effects  from  critical  resources  in  addition  to  band- 
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Figure  27.  Program  size  versus  Speed  Tradeoffs. 
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Figure  28.  Active  Packet  Morphing  for  Network  Optimization. 


width  and  CPU.  Memory,  time  of  execution  or 
buffer  space  could  be  use  to  trade  off  forms  of  data 
representation  to  optimize  certain  system  parame¬ 
ters.  The  ability  of  data  to  change  form  within  a  sys¬ 
tem  opens  up  multiple  optimization  pathsthat  were 
previously  invariant  in  the  system.  Rigorous  security 
quantification  resulting  from  this  work  allows  active 
packets  to  morph  by  adding  the  required  security 
overhead  along  specific  communication  links  such 
that  the  security  of  the  link  along  with  the  security  of 
the  morphed  packet  yield  the  proper  level  of  secu¬ 
rity  required  by  a  given  policy.  Thus,  security  over¬ 
head  is  minimized. 

Another  parameter  that  can  optimize  system 
resources  is  the  knowledge  of  how  a  piece  of  data  is 
used.  M  P3  audio  is  a  good  example  of  how  leaving 
out  information  (specifically  that  which  is  undetect¬ 
able  by  the  human  ear)  can  optimize  data  size.  We 
introduce  here  the  idea  of  "necessary”  data  to  aug¬ 
ment  the  idea  of  "sufficient"  data  or  sufficient  statis¬ 
tics  that  represent  all  information  contained  in  the 
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original  data.  Sufficient  representation  of  data  con- 
tainsall  the  information  that  the  source  data  con¬ 
tains.  Necessary  data  containsonlythe  information 
that  the  source  data  containsthat  the  destination 
instrument  can  effectively  use.  If  you  efficiently 
encapsulate  all  the  information  in  source  data  in  a 
statistical  parameter  you  may  have  achieved  a  mini¬ 
mum  sufficient  statistic.  If  you  further  reduce  this 
statistic  such  that  you  encapsulate  only  the  informa¬ 
tion  that  is  usable  by  the  end  node,  you  have 
obtained  the  minimal  necessary  sufficient  statistic. 
Thus,  Kolmogorov  Complexity  related  ideas  have 
tremendous  impact  for  system  optimization  as  well 
as  security. 

Apparent  complexity 

The  results  in  Section  6.2  give  an  upper  bound  on 
complexity  increase  due  to  computational  opera¬ 
tions,  but  perhaps  one  can  do  better.  In  fact,  the  size 
of  the  shortest  program  one  can  find  to  produce  a 
particular  string  isthe  best  estimate  for  K{x).  Since 
Kolmogorov  Complexity  is  unknowable,  the  best 
that  we  can  do  is  estimate  well.  This  introduces  the 
idea  of  apparent  complexity.  1 1  is  as  if  to  say  "Asfar  as 
I  know,  the  complexityof  this  string  is  this."  The 
benefit  or  possible  way  to  exploit  the  idea  of  appar¬ 
ent  complexity  isthat  a  user  generating  a  string 
should  have  the  best  idea  of  how  hard  it  is  to  gener¬ 
ate  the  string.  There  are  many  reasons  why  a  user 
may  not  choose  to  generateastring  using  the  mini¬ 
mal  size  program.  Perhaps  a  longer  program  can 
execute  faster,  or  perhapsthe  generator  is  unknow¬ 
ingly  using  an  inefficient  process.  However,  the  gen¬ 
erator  of  a  string  of  data  is  presumed  to  have 
knowledge  of  the  process  used  to  generate  that  data. 
This  may  in  fact  make  the  non-computability  of  Kol¬ 
mogorov  Complexity  an  asset:  a  good  candidate  for 
use  in  providing  information  assurance.  The  infor¬ 
mation  system  designer  or  an  authorized  user  gener¬ 
ating  data  should  have  better  knowledge  of  the  data 
process  than  an  attacker  and  an  attacker  cannot  sim¬ 
ply  compute  the  optimal  process.  Conservation  of 
apparent  complexity  enables  abnormalities  to  be 
tracked  when  the  expected  number  of  computa¬ 
tional  operations  is  not  utilized  in  transforming 
string  xinto  string  >  Thus,  even  if  we  cannot  know 
or  compute  the  most  efficient  process  for  creating  a 
string  of  data,  we  can  at  least  gain  benefit  from 
ensuring  through  monitoring  resources  that  the 
expected  process  is  used.  Thistype  of  assurance  has 
in  fact  been  used  informally  to  detect  network  secu¬ 


rity  problems  for  many  years.  Discrepancies  in  com¬ 
puter  account  charges  have  led  to  detection  of 
attack  [122],  The  idea  of  using  Kolmogorov  Com¬ 
plexity  provides  the  possibility  of  using  thistype  of 
technique  on  a  more  fundamental  level,  where 
knowledge aboutthe  information  contentwould  not 
be  required  to  determine  unauthorized  activity.  The 
term  "apparent  complexity"  is  used  to  reflect  the 
best  measurement  of  Kolmogorov  complexity  avail¬ 
able  to  the  party  undertaking  the  measurement. 

Thissection  addresses  the  question  of  howvul- 
nerabi  I  ity  relates  to  complexity  and  howthisleadsto 
the  definition  of  a  new  metric  called  apparent  com¬ 
plexity,  AK.  Figure  29  is  another  view  of  the  opera- 


Figure  29.  Vulnerability  as  Unknown  Behavior. 


tion  of  the  system  in  Figure  8.  In  Figure  29,  system 
operation,  asdesigned  by  the  defender,  isshown  as 
the  state  machine  inside  the  smaller  box  in  the  fore¬ 
ground.  Data  of  a  given  complexity  flows  in  and  out 
of  the  system  as  shown  by  the  arrows.  H  owever,  the 
main  source  of  vulnerability  is  unknown  or  unex¬ 
pected  behavior  that  leads  to  unauthorized  access  of 
information  as  shown  in  the  empty  large  box  in  the 
background. 

Referring  to  our  definition  of  attacker  as  scientist, 
the  attacker  can  be  conceptualized  as  developing 
and  experimentally  validating  hypotheses  regarding 
operation  of  the  system  that  will  lead  to  access  to 
desired  information.  As  previously  discussed,  the 
better  the  hypothesis,  the  less  random,  or  more  com¬ 
pact,  the  string  representing  the  behavior  of  the  sys¬ 
tem  is  required.  Kolmogorov  Complexity,  K,  is  a 
measure  of  the  smallest  programmatic  representa¬ 
tion  for  a  string  that  can  ultimately  be  conceived.  As 
the  size  of  the  attacker's  representation  of  the  string 
formed  by  observed  behavior  approaches  .Kin  con¬ 
structing  a  hypothesis  of  system  operation,  the  bet¬ 
ter  understood  the  system  is  to  the  attacker. 

However,  actually  computing  K\sa  challenge  cur¬ 
rently  out  of  reach.  Instead,  a  more  tractable  substi¬ 
tute  is  proposed  called  Apparent  Complexity,  as 
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shown  in  Definition  6.5.  The  salient  feature  of  this 
definition  isthat,  while  similar  to  K(x),  is  relative.  It 
isrelativein  this  definition  to  the  capability  of  a  user, 
r,  to  define  the  smallest  program. 

In  this defi nition,  if,  then  rhasan  unreasonable 
hypothesis,  a  hypothesis  that  does  not  contribute  to 
an  understanding  of  the  system.  If,  then  rhas found 
a  hypothesis  that  perfectly  explains  system  behavior. 
Thus,  it  is  reasonable  to  expect  to  lie  in  the  range 
between  /K...l(x)/.  Note  that  r  can  bean  individual, 
such  as  a  single  attacker  or  single  defender,  or  a 
group  with  a  common  view  of  complexity.  I  n  fact,  a 
group  of  individualscould  converge  to  a  common  by 
means  of  a  protocol  similar  to  a  routing  protocol. 

Thestring  xin  thisapplication  isa  list  of  eventsby 
an  observer  external  to  the  system,  for  example,  the 
attacker  or  the  defender.  I  nitially  the  defender  is 
likely  to  best  understand  the  system.  The  complexity 
of  the  system,  AK  in  Figure  29,  is  itself  a  measure  of 
the  defender's.  This  is  because  the  defender's  goal  is 
usually  to  develop  the  most  efficient  system  possible. 
Thus,  the  system  is  in  effect  the  defender's  best 
approximation  of  .K  with  respect  to  generating  out¬ 
put;  in  other  words,,  is  likely  to  be  near  l(x)  initially. 
Note  that  the  attacker  is  likely  to  have,  at  minimum, 
general  knowledge  of  such  areasaslogin  procedures 
and  possible  error  conditions  such  as  queue  over¬ 
flows,  and  will  be  likely  to  test  limit  points  where  sys¬ 
tems  interconnect  as  being  weak  links.  A  defender 
can  compute,  but  how  is  determined?  One  method 
isfor  the  defender  to  determine  all  externally 
observable  points,  and  attempt  to  compute.  The 
accuracy  of  this  approximation  depends  upon  the 
defender's  knowledge  of  all  the  externally  observ¬ 
able  points.  The  accuracy  also  depends  on  the 
defender's  knowledge  being  equal  or  better  than  the 
attacker's. 

Conservation  of  complexity 

Conserved  variables  enable  usto  deduce  parameters 
from  the  presence  or  absence  of  other  parameters. 
The  Law  of  Conservation  of  Matter  and  Energy 
[127],  for  example,  allowsoneto  deduce  how  well  a 
thermodynamic  system  isfunctioning  without  know¬ 
ing  every  parameter  in  the  system.  Heat  gain  in  one 
part  of  the  system  was  either  produced  by  some  pro¬ 
cess  or  traveled  from  (and  was  lost  from)  another 
part  of  thesystem.  One  knows  that  if  thethermal 
efficiency  of  a  thermodynamic  system  falls  below  cer¬ 
tain  thresholds  then  there  is  problem.  On  the  other 
hand,  if  more  heat  is  produced  by  a  system  than 


expected,  some  unintended  process  is  at  work.  A 
similar  situation  is  desirable  for  information  sys¬ 
tems— the  ability  to  detect  lack  of  assurance  by  the 
presence  of  something  unexpected,  or  the  absence 
of  so  m eth  i  n  g  th  at  i  s  ex pected .  T  h  i  s  seem s  to  be  f ar 
from  our  reach,  given  that  information  is  easily  cre¬ 
ated  and  destroyed  with  little  residual  evidence  or 
impact. 

One  possible  candidate  for  a  conserved  variable 
in  an  information  system  isKolmogorovComplexity. 
Suppose  you  could  easily  know  the  exact  Kolmog¬ 
orov  Complexity  K{x)  of  a  string  of  data,  x.  You 
would  essentially  have  a  conserved  parameter  that 
could  be  used  to  detect,  resolve  or  infer  events  that 
occur  in  thesystem,  just  as  tracking  heat  in  a  ther¬ 
modynamic  system  enables  monitoring  of  that  sys¬ 
tem.  Operations  that  affect  string  S  and  cause  it  to 
gain  or  lose  complexity  can  be  accounted  for,  and  an 
expected  change  in  complexity  should  be  resolvable 
with  the  known  (secured)  operationsoccurring  in 
the  information  system  to  produce  expected 
changes  in  complexity.  Complexity  changes  that 
occur  in  a  system  that  cannot  be  accounted  for  by 
known  system  operations  are  indicationsof  unautho¬ 
rized  processes  taking  place.  Thus,  in  the  ideal  case 
where  KolmogorovComplexity  is  known,  a  check 
and  balance  on  an  information  system  that  enables 
assurance  of  proper  operation  and  detection  of 
unauthorized  activity  is  possible.  Unfortunately,  as 
previously  discussed,  a  precise  measure  of  Kolmog¬ 
orovComplexity  is  not  computable.  We  can,  how¬ 
ever,  determine  a  bound  on  the  Kolmogorov 
Complexity  as  shown  in  the  theorems  below. 

Theorems  of  conservation 

KolmogorovComplexity,  K{x),  can  be  thought  of  as 
a  conserved  variable  that  changes  through  computa¬ 
tional  operations  conducted  upon  strings.  In  order 
for  K{x)  to  be  a  conserved  variable  we  must  be  able 
to  account  for  changes  in  K{x),  which  must  be  corre¬ 
lated  with  another  known  value.  Theorems  6.1  and 
6.2  presented  below  enable  bounds  to  be  placed  on 
thechangesin  K{x)  that  occur  due  to  computational 
operationsoccurring  in  an  information  system.  The 
two  theorems  below  show  bounds  on  the  amount  of 
complexity  that  can  exist  due  to  knowledge  of  other 
strings  or  conducting  computational  operations. 

While  not  computable  from  below,  upper  bounds 
on  the  increase  in  KolmogorovComplexity  can  be 
crudely  known  by  keeping  track  of  the  size  of  pro- 
gramsthat  affect  data.  This  bound  maybe  incredibly 
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loose,  as  it  is  quite  possible  to  operate  on  a  string 
and  make  it  much  less  complex  than  the  input.  One 
would  need  a  method  to  recognize  this  simplifica¬ 
tion.  However,  these  results  provide  an  intuitively 
attractive  method  for  quantifying  the  "work”  per¬ 
formed  by  a  computational  operation  on  informa¬ 
tion  -  the  change  in  complexity  introduced  by  the 
operation.  A  thorough  treatment  of  bounds  related 
to  K[y\x)  and  the  "Information  Distance"  between 
strings  is  contained  in  Bennett  etal.  [122], 

2.6  COM  PLEXITY  ESTIM  ATION  ALGORITHM  S 
FOR  INFORMATION  ASSURANCE 

In  order  to  motivate  the  study  of  complexity  estima¬ 
tors  for  information  assurance,  this  section  will  high¬ 
light  recent  results  using  complexity  to  detect  FTP 
exploits  [ 97]  and  Distributed  Denial  of  Service 
(DDoS)  attacks [118],  General  principle  and  con¬ 
cepts  of  complexity  based  vulnerability  analysis  will 
also  be  discussed,  providing  further  potential  infor¬ 
mation  security  applications  of  complexity  estima¬ 
tors. 

Detection  of  FTP  exploits  using  protocol 
header  information 

In  [97],  the  use  of  the  principle  of  conservation  of 
complexity  to  detect  anomalous  use  of  the  FTP  pro¬ 
tocol  for  intrusion  detection  is  presented.  Protocols 
enforce  patterns  by  design,  and  the  level  of  redun¬ 
dancy  or  patterns  in  protocol  information,  which 
can  be  measured  through  complexity  metrics,  was 
hypothesized  to  be  an  objective  indication  of  attack 
vs.  healthy  behavior.  Here  complexity  is  estimated 
using  U  nix  compress— a  universal  compression  algo¬ 
rithm  based  on  Lempel  Ziv78.  TCP  dump  data  from 
FTP  control  connections,  filtered  to  remove  certain 
high  variance  fields  such  as  time  stamps,  was  com¬ 
pressed  using  U  nix  compress  and  compared  to  trace 
files  of  healthy  sessions.  Results,  summarized  in 
Figure  30,  indicate  that  attack  sessions,  obtained 
from  running  various  FTP  exploit  scripts  down¬ 
loaded  from  numerous  I  nternet  sites,  have  measur¬ 
ably  lower  complexity  (when  normalized  against 
trace  length)  than  healthy  FTP  sessions. 

These  results  indicate  that  the  principle  of  con¬ 
servation  of  complexity  applied  to  FTP  exploits 
enables  detection  of  inappropriate  or  unhealthy  use 


Figure  30.  Inverse  Compression  Ratio  of  Filtered  FTP  Ses¬ 
sion  Trace  Files  For  Attacks  and  Healthy  Sessions. 


of  the  FTP  protocol.  These  concepts  are  summa¬ 
rized  in  Figure  31. 


Figure  31.  Conservation  of  Complexity  Applied  to  FTP  Exploits. 


The  FTP  protocol  specification,  RFC  959, 
enforces  and  enables  a  certain  set  of  behaviors  that 
results  from  the  rules  and  specifications  of  the  pro¬ 
tocol  being  exercised  by  the  allowable  space  of  user 
inputs.  The  large  Space  of  Models  shown  in  the  fig¬ 
ure  indicates  thisset  of  behaviors  or  models.  Ideally, 
the  allowable  space  of  models  would  be  calculated 
from  the  protocol  specifications,  resulting  in  a  space 
of  models  consisting  of  finite  state  machines,  push¬ 
down  automata,  or  whatever  modeling  device 
achieves  the  KolmogorovComplexityfor  the  partic¬ 
ular  behavior.  Among  thisset  of  models  would  be 
models  corresponding  to  healthy  behaviors  and 
models  corresponding  to  attack  behaviors.  The 
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results  in  indicate  that  the  complexity  of  the  TCP 
dump  header  information,  which  can  be  interpreted 
as  an  objective  measure  of  the  size  model  repre¬ 
sented  by  the  data,  indicates  that  attack  behaviors 
(when  normalized  by  session  size)  tend  to  be  less 
intricate  or  smaller  modelsthan  normal  healthy  ses¬ 
sions,  thusenabling  detection  of  exploits  through 
complexity  estimators.  Further  research  is  in 
progress  to  expand  upon  these  results,  but  clearly 
better  complexity  estimators  will  benefit  character¬ 
ization  of  behavior  modelsand  the  ability  to  discern 
attacks. 

Detection  of  DDOS  using  differential 
complexity  of  data  payload 

Distributed  denial-of-service  (DDoS)  attacks  are 
caused  by  an  attacker  flooding  the  target  machine 
with  a  torrent  of  packetsoriginating  from  a  number 
of  machines  under  the  attacker's  control.  These 
machines  are  called  'zombies'.  Toolsthat  control 
and  launch  attacks  from  these  zombies  against  the 
target  perform  the  attacks.  The  attacks  can  cause 
networks  to  be  disabled  for  extended  periods  of 
time  during  which  customers,  employees,  and  busi¬ 
ness  partners,  are  unable  to  access  information  or 
perform  transactions.  This  section  describes  an 
approach  that  leverages  fundamentals  of  informa¬ 
tion  complexity  to  provide  a  flexible  and  effective 
method  for  detection  of  distributed  denial  of  service 
attacks.  Stated  as  simply  and  succinctly  as  possible, 
we  hypothesize  that  information,  comprising  obser¬ 
vations  of  actions  with  a  single  root  cause,  whether 
they  are  faults  or  attacks,  is  highly  correl  ated.  Highly 
correlated  data  has  a  high  compression  ratio. 

The  DDoS  attack  detection  algorithm  makes  use 
of  a  fundamental  theorem  of  Kolmogorov  Complex¬ 
ity  that  states:  for  any  two  random  stringsXand  Y, 
K(XY)  £  K(X)  +  K(Y)  +  C where  K(X)  and  K(Y)  are  the 
complexities  of  the  respective  strings,  c  is  a  constant 
and  K(X.Y)  isthe  joint  complexity  of  the  concatena¬ 
tion  of  the  strings.  Proof  for  the  above  theorem  is 
described  in  [161].  Simply  put,  the  joint  Kolmog¬ 
orov  Complexity  of  two  strings  is  less  than  or  equal 
to  the  sum  of  the  complexitiesof  the  individual 
strings.  The  equivalence  holds  when  the  two  strings 
Xand  Fare  completely  random  i.e.  they  are  totally 
unrelated  to  each  other.  Another  effect  of  this  rela¬ 


tionship  is  that  the  joint  complexity  of  the  strings 
decreases  as  the  correlation  between  the  strings 
increases.  Intuitively,  if  two  strings  are  related,  they 
share  common  characteristics  and  thus  common 
patterns.  That  knowledge  can  be  harnessed  to  gen¬ 
erate  a  smaller  program  that  can  represent  the  com¬ 
bined  string. 

In  terms  of  detection  of  DDoS  attacks,  the  prop¬ 
erty  given  by  Inequality  (1)  is  exploited  to  distin¬ 
guish  between  concerted  denial-of-service  attacks 
and  casesof  traffic  overload.  The  assumption  isthat 
an  attacker  performs  an  attack  using  large  numbers 
of  similar  packets(in  termsof  their  type,  destination 
address,  execution  pattern,  timing,  etc.)  sourced 
from  different  locations  but  intended  for  the  same 
destination. Thus,  there isa  high  degreeof  similarity 
in  the  traffic  pattern.  A  Kolmogorov  Complexity 
based  detection  algorithm  can  quickly  identify  such 
a  pattern.  On  the  other  hand,  a  case  of  legitimate 
traffic  overload  in  the  network  tends  to  have  many 
different  traffic  types.  There  is  not  much  correlation 
between  thedifferenttrafficflowsand,  in  aggregate, 
the  traffic  appear  to  have  a  random  pattern.  There¬ 
fore,  our  algorithm  samples  every  distinct  flow  of 
packets  (distinguished  by  their  source  and  destina¬ 
tion  addresses)  to  determine  if  there  isa  large 
amount  of  correlation  between  the  packets  in  a  flow. 
If  it  isdetermined  to  be  so,  then  all  suspiciousflows 
at  the  node  are  again  correlated  with  each  other  to 
determine  that  it  is  indeed  an  attack  and  not  a  case 
of  a  traffic  overload. 

The  correlation  itself  is  performed  in  the  follow¬ 
ing  manner.  For  the  collected  samples,  the  probe 
cal  cu  I  ates  a  co  m  p  I  exi  ty  d  i  ff eren  ti  al  over  th  e  sam  p  I  es. 
Complexity  differential  isdefined  as  the  difference 
between  the  cumulative  complexities  of  individual 
packets  and  the  total  complexity  computed  when 
those  packets  are  concatenated  to  form  a  single 
packet.  If  packets  xh  x2,  x3...xn  have  complexities 
Kfxj),  K(x2),  K(x3)...  K(xn),  then  the  complexity 
differential  is  computed  as: 

[K(xl)  +  K(x2)  +  ...  +K(xn)]-K(xlx2...xn) 

where  K(x1x2x3...xn)  isthe  complexity  of  the  packets 
concatenated  together  as  measured  in  a  finite  time 


34 


2.6  Complexity  Estimation  Algorithms  for  Information  Assurance 


interval  window  (Figure  32).  If  packets  X],  X2,  x3...xn 


Figure  32.  Topology  of  the  experiment. 


are  completely  random,  K(x1x2x3...xn)  will  be  equal 
to  the  sum  of  the  individual  complexities  and  the 
complexity  differential  will  therefore  be  zero.  How¬ 
ever,  if  the  packets  are  highly  correlated  i.e.  some 
pattern  emerges  in  their  concatenation,  then  the 
concatenated  packet  can  be  represented  by  a 
smaller  program  and  hence  its  complexity  i.e. 
K(x1x2x3...xn)  will  be  smaller  than  the  cumulative 
complexity. 

We  compared  our  technique  to  a  prototype 
packet  counting  algorithm  for  DDoS  detection  and 
found  that  our  technique  is  better  at  discriminating 
traffic  patterns.  The  experimental  setup  consisted  of 
a  set  of  active  nodes  arranged  in  the  topology  shown 
in.NodeAH-lcontinuouslygen  er  ates  traffic  consist¬ 
ing  of  audio  packets  destined  for  node  AN-2.  The 
load  induced  by  this  traffic  is  high  enough  that  it  is 
registered  at  node  AN-1  asa  'suspicious'  flowi.e.,  a 
traffic  flow  whose  complexity  differential  exceeds 
the  threshold.  The  load  induced  by  this  traffic  flow 
is  kept  constant  throughout  the  experiment.  Node 
AH  -2  generates  the  attack  flow.  The  load  induced  by 
the  attack  flow  is  varied  to  determine  the  perfor¬ 
mance  of  the  algorithms.  The  experiment  is  run 
twice,  once  with  only  the  attack  source  on  ( node  AH - 
2  transmitting  only)  and  the  next  time  with  both 
sources  on  ( both  node  AH  -1  and  node  AH  -2  trans¬ 
mitting).  The  rationale  isthat  an  attack  is  essentially 
a  sustained  overload  induced  for  sometime  interval. 
The  purpose  of  the  experiment  isto  determine  the 
effectiveness  of  the  two  techniques  in  separating  and 
identifying  an  attack  in  the  presence  of  background 
traffic 

Figure  32  and  Figure  33  showthe  performance  of 
the  packet-counting  and  complexity-based 


approaches,  respectively  as  measured  against  the 
load  induced  by  the  two  sources  (in  packets  per  sec¬ 
ond)  described  above.  Figure  33  shows  that  the 


Figure  33.  Performance  of  packet  counting  metric. 


packet-counting  metric  cannot  discriminate 
between  an  attack  and  a  true  overload.  When  the 
audio  source  is  transmitting  in  conjunction  with  the 
attack  source,  any  threshold  set  by  the  packet-count¬ 
ing  algorithm  running  on  node  AN-1  will  be 
exceeded  leading  to  the  false  conclusion  that  the 
node  is  under  attack.  For  example,  based  on  the 
attack  pattern  only(dashed  curve),  we  decide  to  set 
the  threshold  at  70  packets/  sfor  a  load  of  0.6.  When 
the  audio  source  is  introduced,  the  combined  traffic 
trips  the  same  threshold  at  a  load  of  only  0.4,  which 
is  a  false  positive. 

Figure  33  shows  the  complexity  differential  versus 
load  curve  for  a  given  sampled  time  interval,  which 
in  thiscasewas  10  seconds.  Note  that  higher  differ¬ 
ential  complexity  corresponds  to  reduced  complex¬ 
ity  of  the  flow.  In  effect,  the  higher  differential 
complexity  estimates  the  deviation  from  the  ran- 
domnessinherentin  a  healthy  network  with  a  mix  of 
different  traffic  flows.  The  experiment  thusshows 
that  the  attack  flow  is  estimated  to  be  less  complex 
over  time  than  the  ambient  legitimate  traffic.  It  isto 
be  noted  that  the  complexity-based  metric  does  not 
change  its  behavior  when  a  combination  of  attack 
and  traffic  sources  is  used.  This  is  because  the  attack 
traffic  dominates  the  combined  flow  and  hence  the 
complexity  differential  is  roughly  equal  to  that 
observed  if  only  the  attack  flow  existed.  Therefore, 
the  complexity-based  approach  is  more  accurate  in 
separating  false  alarmsfrom  true  attacks  because  it 
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can  conser  ve  salient  patternsof  a  traffic  flow 
(Figure  34). 


Figure  34.  Performance  of  complexity-based  metric. 


Complexity-based  vulnerability  analysis 

Complexity-based  vulnerability  analysis  attempts  to 
determine  the  likelihood  of  attack  innovation.  An 
attacker  initially  viewsa  system  asa  black  box.  The 
attacker  must  form  hypotheses  about  the  system  and 
test  those  hypotheses  to  successfully  prosecute  an 
attack.  The  hypothesis  suggested  by  complexity- 
based  vulnerability  analysis  isthat  less  complex  com¬ 
ponents  of  the  system  will  be  easier  to  understand, 
quicker  to  be  manipulated,  and  are  therefore  more 
vulnerable.  The  ability  of  an  attacker  to  understand, 
and  thus  successfully  innovate  a  new  attack  against  a 
system  component  is  directly  related  to  the  size  of 
the  minimal  description  of  that  component. 

A  common  criticism  of  complexity-based  vulnera¬ 
bility  analysis  isthat  components  that  are  more  com¬ 
plex  could  be  vulnerable  for  precisely  the  same 
reason;  namely,  the  defender  does  not  fully  under¬ 
stand  high  complexity  components,  thus  leaving 
potential  vulnerabilities.  However,  even  in  thiscir- 
cumstance,  the  fundamental  hypothesis  remains 
valid,  the  attacker  must  understand  components 
well  enough  to  manipulate  them;  it  isthe  likelihood 
of  understanding  component  manipulation  well- 
enough  by  the  attacker  that  remainsthe  object  of 
complexity-based  vulnerability  analysis.  In  fact,  the 


Internet  Protocol  suite,  which  claimssimplicityas 
one  of  its  virtues,  isa  popular  target  of  attack;  sim¬ 
plicity  (the  colloquial  use  of  the  term  is  used  here 
because  the  I  n  ter  net  Protocol  suite  lacks  a  definition 
of  simplicity)  has  not  appeared  to  reduce  its  vulnera¬ 
bility. 

The  technique  used  by  the  attacker  can  be 
broadly  defined  as  an  attempt  to  move  the  system 
into  a  state  unanticipated  at  system  design  time.  For 
purposes  of  simplification,  consider  the  system  as 
described  by  a  finite  automata  (FA)  in  a  black  box. 
The  attacker  can  input  data  to  the  FA  and  receive 
output.  The  attacker  can  use  patterns  in  the 
input/  output  data  to  deduce  information  about  the 
system.  The  complexity-based  vulnerability  hypothe¬ 
sis  can  be  restated  more  precisely  as:  lower  complex¬ 
ity  components  of  the  system,  from  the  attacker's 
point  of  view,  that  move  the  system  into  a  state  unan¬ 
ticipated  in  system  design,  will  be  deduced  sooner. 

The  complexity-based  vulnerability  hypothesisisa 
meta-hypothesis  because  it  isa  hypothesis  involving 
an  attacker's  hypotheses.  The  meta-hypothesis  must 
be  validated  by  experimentation.  An  ideal  experi¬ 
ment,  assuming  careful  setup  and  control  is  pre¬ 
sented  in[  109] .  The  use  of  complexity  metrics  for 
vulnerability  analysis  is,  along  with  the  applications 
discussed  above,  one  of  the  possible  objective  usesof 
complexity  theory  that  will  benefit  from  accurate 
and  low  overhead  complexity  metrics,  which  isthe 
subject  of  this  report. 

M  ethods  of  estimating  complexity 

The  previous  section  identifies  two  methodsfor  esti¬ 
mating  complexity-  empirical  entropyand  universal 
compression  algorithms.  Both  metrics  are  related  to 
KolmogorovComplexity,  in  thatK(x)  isthe  ultimate 
compression  bound  for  a  given  finite  string  x.  Thus 
any  universal  compression  algorithm  isa  natural 
choice  for  a  complexity  estimator.  However,  since 
universal  compression  algorithms  are  designed  to 
apply  to  populations  of  strings,  the  ultimate  com¬ 
pression  bound  for  a  specific  string  is  general  ly 
smaller  than  that  achieved  by  a  universal  compres¬ 
sor. 
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Figure  35  describeshowthe  quality  of  complexity 

Complexity  Estimation 


Complexity  Estimation  Methods 


Universal  Turing  Machine 


Pu&hdown  Automata 


Finite  Automata 


Binary  Trees 


,  Kolmogorov  Complexity 

Ttieorebcal  bounds  based  on 
universal  luring  machine. 


Complexity  Estimation  Method 

Determined  by  avertable  resources: 

•  Computational  capability  ot  device 

•  Time  required  lor  computation 

•  Memory  need  for  computation 

•  Models  to  be  considered 


,  Computation  at  Mechanics: 

I  Starts  with  least  computationally 
capable  machine  and  works  up. 


Figure  35.  Hierarchy  of  computational  platforms  in  estimat¬ 
ing  complexity. 


estimator  istied  to  the  computational  model  consid¬ 
ered  in  estimating  complexity.  Simple  estimates, 
such  as  empirical  entropy  lie  at  the  bottom  of 
Figure  35  and  can  be  implemented  using  very  sim¬ 
ple  computational  platforms  with  very  little  over¬ 
head.  Popular  universal  compression  algorithmscan 
be  implemented  with  context  free  grammars  (finite 
automata)  and  provide  additional  accuracy  in  com¬ 
plexity  estimation  at  additional  computational 
expense.  Kolmogorov  complexity  isthe  theoretical 
limit  for  complexity  estimation  that  requires  compu¬ 
tational  capability  equal  to  that  of  a  universal  Turing 
machine  on  which  anything  computable  can  be 
computed.  Due  to  the  halting  problem  in  searching 
for  the  theoretical  limit  for  a  specific  string  (we  can¬ 
not  determine  if  a  candidate  compressor  will  ever 
halt)  the  theoretical  bound  cannot  be  achieved  One 
example  of  a  universal  compression  algorithm  isthe 
universal  compression  algorithm  designed  byLem- 
pel  and  Ziv  which  is  known  asLZ78[183].TheLZ78 
algorithm  partitions  a  string  into  prefixes  that  it 
hasn't  seen  before,  forming  a  codebook  that  will 


enable  long  strings  to  be  encoded  with  small  indexes 
(Figure  36). 

Tree  Partition:  1 ,0, 1 1 ,0 1 ,00, 1 0,01 1 ,0 1 0,0 1 00, 1 1 1 ,0 1 00 1 ,001 , 1 00,0 1 0 


Figure  36.  LZ78  binary  tree  representation  of  the  partition  for 
the  binary  string:  1011010010011010010011101001001100010. 
Nodes  contained  in  the  partition  are  colored  in  black. 

Consider  an  example  to  illustrate  how  this  algo¬ 
rithm  works:  LZ  partitioning  of  the  string: 

1011010010011010010011101001001100010 

is  performed  by  inserting  commas  each  time  a 
sub-string  that  has  not  yet  been  identified  isseen. 
The  following  partition  results: 

1,0,11,01,00,10,011,010,0100,111,01001,001,100,010 

Figure  36  rep  resents  this  string  partition  asa 
binary  tree.  The  nodes  marked  in  black  of  the  five 
level  tree  shown  are  nodes  contained  in  the  LZ78 
partition  of  the  example  string.  Nodes  that  are  not 
filled  in  indicate  code  wordsor  phrases  that  are  not 
contained  in  the  LZ78  partition.  Each  node  or 
phrase  occurs  exactly  once  in  the  string  with  the 
exception  of  the  last  phrase  which  may  be  a  repeat 
of  a  previously  seen  node.  Good  compression  (low 
complexity  estimation)  results  when  the  LZ78  parti¬ 
tion  contains  a  deep,  sparse  tree,  while  poor  com¬ 
pression  (high  complexity  estimation)  results  from 
strings  that  are  lessdeep  and  more  completely  popu¬ 
lated  at  each  level 

A  comparison  of  ubiquitous  complexity 
estimators 

This  section  compares  the  performance  of  the  com¬ 
plexity  estimators  used  in  our  work  so  far,  viz.  empir¬ 
ical  entropy,  Zlib-compressand  LZ78-code-length 
estimator. 

Empirical  entropyestimation  technique  measures 
th  e  wei  ght  of  'l's  that  occur  in  a  binary  string  in  rela¬ 
tion  to  the  length  of  the  string.  Thusa  string  with  a 
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higher  number  of  l's  is  estimated  to  have  a  higher 
complexity  as  compared  to  a  string  of  thesame 
length  with  a  lower  number  of  l's.  Thisestimation 
technique  iscomputationallysimple  and  fast  but  not 
very  accurate. 

Zl  i  b-co  m  p  ress  uses  th  e  j  ava.  uti  I  ,zi  p .  D  efl  ater  cl  ass 
found  in  thejava compression  library [184],  Attacks 
using  Kolmogorov  Complexity  Metrics.  The  inverse 
compression  ratio  is  estimated  to  the  complexity  of 
the  string  using  this  method.  This  universal  com¬ 
pression  algorithm  utilizes LZ77 coupled  with  Huff¬ 
man  coding. 

The  LZ-code-length  estimation  method  attempts 
to  guess  at  the  amount  of  compression  possible  for  a 
string  using  the  LZ78  compression  algorithm  with¬ 
out  actually  performing  it.  The  three  estimators 
were  compared  for  accuracy  by  conducting  the  fol¬ 
lowing  experiment.  A  byte  buffer  was  filled  with  par¬ 
titioned  into  two,  one  of  which  was  filled  with 
patterned  data  (i.e.,  data  having  a  known  pattern) 
and  the  other  part  was  filled  with  random  data.  The 
estimators  were  run  on  the  buffer  to  get  a  complex¬ 
ity  value  for  each  estimator.  The  ratio  of  the  random 
data  to  the  pattern  data  in  the  buffer  was  increased 
in  each  successive  run  set.  Thusthe  first  set  had  all 
pattern  data  in  the  buffer  (and  thus  low  complexity) 
while  the  final  set  had  all  random  data  (and  thus 
high  complexity).  The  pattern  in  the  patterned  data 
was  varied  to  prevent  bias  and  the  average  complex¬ 
ity  value  was  chosen  for  each  set. 

Figure  37  shows  a  comparison  of  the  three  estima¬ 
tors.  Zlib-compresshasa  linear  change  in  complex- 
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%  Non-Pattern  Data 

Figure  37.  Complexity  estimate  in  bits  vs.  randomness  of  data. 


ity  as  the  ratio  of  non-pattern  data  to  pattern  data 
increases  and  is  the  most  accurate  of  the  three  esti¬ 
mators.  LZ-encode  is  seen  to  overestimate  the  com¬ 
plexity  of  the  data  with  respect  to  Zlib-compress 
throughout  the  range  of  the  experiment.  Empirical 
entropy  is  seen  to  be  the  least  accurate  over  the 
ran  ge.  1 1  grossl  y  o veresti  m  ates  th  e  co  m  p  I  exi ty  wh  en 
there  is  more  pattern  data  in  the  buffer  because  it  is 
simply  counting  the  weight  of  the  l's  in  the  string 
instead  of  looking  for  patterns. 

Figure  38  shows  the  variance  of  the  estimated  val- 

Complexity  Estimate  Variance 
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%  Non-Pattern  Data 

Figure  38.  Complexityestimationvariancevs. randomness 

of  data. 

uesover  the  range  of  the  experiment.  Zlib-compress 
can  be  seen  to  be  least  sensitive  to  the  type  of  pat¬ 
tern  data  used.  Empirical  entropy  estimation  vari¬ 
ance  is  highest  when  different  patterns  are  used  for 
the  same  ratio  of  random  to  pattern  data.  This  is 
because  the  count  of  l's  in  the  pattern  data  affects 
the  estimation  value.  On  the  other  hand,  when  the 
data  is  more  random,  the  ratio  of  l's  is  expected  to 
be  remain  close  to  0.5. 

Minimum  description  length  principles 

The  previous  section  identifies  the  challenges  in 
dealing  with  variance  and  accuracy  of  complexity 
estimators.  We  will  now  address  a  related  concept 
which  will  be  used  to  build  a  new  complexity  estima¬ 
tion  algorithm  Two  statistical  application  techniques 
for  inductive  inference  that  are  quite  similar  to  the 
KolmogorovComplexity,  and  with  strong  links  to 
information  theoryare known  asMinimum  Message 
Length  (MML)  coding  and  Minimum  Description 
length  (MDL)  coding  [181],  For  our  purposes  these 
techniques  are  equivalent  and  will  be  used  inter- 
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changeably.  MML  coding  encodes  information  as  a 
hypothesis  or  model  that  identifies  the  presumptive 
distribution,  from  which  data  originated,  appended 
with  a  string  of  data,  coded  in  an  optimal  way.  The 
total  descriptive  constant  C  of  a  string  under  the 
conceptsof  MML  orMDL  string  can  be  defined  in 
two  parts  as  C  =  M  +  D,  where  Mis  the  model  cost 
and  D  is  the  data  cost.  In  the  preferred  two-part 
description  the  model  Mdescribesall  regularity 
associated  with  a  string  and  the  data  portion  D 
describes  the  random  elements  of  the  string  where 
the  sum  of  the  lengths  of  these  two  parts  is  equal  to 
the  Kolmogorov  complexity  of  the  string.  In  other 
words,  the  best  two-part  description  of  the  data 
should  not  be  longer  than  the  optimal  single  part 
description  of  the  data.  A  two-part  description  is 
known  to  exist  if  only  consisting  of  a  set  containing  a 
single  string.  There  are  generally  many  two-part 
descriptions  of  a  string,  the  shortest  being  termed 
the  algorithmic  minimum  sufficient  statistic.  A  thor¬ 
ough  treatment  of  algorithmic  statistics  for  the  class 
of  modelsconsisting  of  finite  sets  and  probability  dis¬ 
tributions  is  contained  in  [172], 

Sophistication 

One  criticism  of  the  use  of  Kolmogorov  Complexity 
for  characterization  of  information  isthatitisin 
some  sense  a  measure  of  randomness.  Random 
information  is  not  necessarily  important  informa¬ 
tion.  Thiscriticism  can  be  addressed  by  thinking  of 
Kolmogorov  complexity  as  two  parts.  Sophistication 
isa  measure  of  meaningful  information  that  was  for¬ 
malized  in  [183],  harnessing  the  fact  that  the  short¬ 
est  description  of  an  object  (with  length  equal  to 
KolmogorovComplexity)  can  be  expressed  in  two 
parts.  The  first  part  describes  a  Turing  machine  that, 
given  the  second  part  as  input,  produces  a  given 
string.  The  first  part  of  the  code  modelsthe  regular¬ 
ities  of  the  string,  while  the  second  part  describes 
the  irregularities.  The  combination  of  model  and 
code  that  construct  a  string  must  together  be  the 
smallest  under  the  M  inimum  Description  Length 
criteria.  Vitanyi  in  [183]  expandsthe  model  space  to 
include  not  just  finite  sets  but  also  any  computable 
model  in  the  recursive  function  class.  The  relation¬ 


ships  between  Sophistication  and  KolmogorovCom¬ 
plexity  are  shown  in  Figure  39. 

Sophistication  is  the  size  of  the  description  of  the  model  in 
the  smallest  two  part  description  of  a  binary  string 
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even  though  their  complexities 
are  the  same 
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Recursive  function  p  is  a  sufficient  statistic  for  data  x  if: 
soph(x)  =  min{l(p):(p,  d)  is  a  description  of  x} 

Figure  39.  Comparison  of  sophistication  and  Kolmogorov 

Complexity. 

The  motivation  for  studying  sophistication  is 
rooted  in  a  search  for  a  method  to  quantify  "mean¬ 
ingful  information".  In  an  attempt  to  rigorously 
definethe  normally  subjective  notion  of  "meaning", 
Vitanyi  associates  the  model  with  the  meaningful 
information  involved  in  the  string,  while  the  second 
part  of  the  code  isthe  meaningless  part.  For  exam¬ 
ple,  a  purely  random  string  of  length  L  has  Kolmog¬ 
orov  Complexity  on  the  order  of  L.  We  could 
compare  thisto  a  string  taken  from  a  piece  of 
English  text  that  has  Kolmogorov  Complexity  of  L 
(the  uncompressed  string  would  have  length  much 
larger  than  L) .  Even  though  these  two  strings  have 
similar  complexity,  there  are  large  differences 
between  their  two-part  codes.  The  random  string  has 
no  model  associated  with  it  and  is  essential  ly  al  I  data. 
The  English  text  will  conform  to  some  model  associ¬ 
ated  with  the  frequency  of  use  of  letters,  words  and 
phrases.  If  the  author  of  the  text  is  known,  a  better 
model  tuned  to  the  writer's  vocabulary  is  possible. 
Thus,  minimal  two-part  code  of  the  sample  of 
English  text  will  consist  of  a  fair  amount  of  model  in 
addition  to  data  encoded  under  the  model.  Thus 
the  English  text  sample  would  be  considered  more 
sophisticated  than  the  random  data  even  though 
they  are  equally  complex. 

A  new  complexity  and  sophistication 
estimation  algorithm 

Sophistication  motivates  the  search  for  new  com¬ 
plexity  and  sophistication  metrics  that  notonlyindi- 
cate  the  compressibility  or  randomnessof  a  given 
string,  but  also  indicate  information  about  the  size 
of  the  model  that  could  produce  the  string.  Univer¬ 
sal  compression  algorithms  are  not  designed  to  do 
this,  since  their  ultimate  goal  isto  produce  a  one 
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part  encoded  version  of  a  string  that  can  be  recon¬ 
structed  into  the  string  at  will.  In  this  section,  we 
derive  a  heuristic  that  will  lead  to  a  new  universal 
sophistication  and  complexity  estimator  that  formsa 
two-part  code  for  a  given  string. 

We  consider  compression  of  a  finite  binary  string 
Xof  length  L.  We  seek  the  optimal  partition  of  X 
into  I  symbolsthat  can  be  encoded  using  a  near  opti¬ 
mal  encoding  strategy  such  asHuffman  coding  such 
that  the  combination  of  the  descriptive  cost  of  the 
model  plus  the  encoding  of  the  data  under  the 
model  are  minimized.  The  following  parameters  are 
defined: 

Table  3  Table  1:  OSCR  parameters 


Parameter 

Meaning 

L 

Length  of  finite  string  6' 

I 

lotal  number  of  symbols  in  partition 

;  G  [  1 ,/] 

k 

Length  of  symbol  l 

ri 

Number  of  repetitions  of  symbol  /  in  s 

f  repetitions  for  all  symbols  are  equal 
then  r2  =  r  and 

*  =  2', 

i 

R 

Total  number  of  repetitions, 

h 

Length  of  R  consumed  by  symbol  i 

L  =  IjTj 

L  =  lLi 

i 

CP 

Total  Descriptive  cost  of  .s  under  parti- 

tion  p.  Equal  to  the  sum  of  the  model 
description  Mplustheencoding  of  the 
data  under  the  given  model. 

Ci 

Descriptive  Cost  of  Symbol  ?.  This 

parameter  will  be  derived  in  section  4 

Di 

Symbol  Compression  Ratio  (SCR) 

The  effect  of  a  partition  on  M  M  L 

The  entropy  of  a  distribution  of  symbols  (Hs) 
defines  the  average  per  symbol  compression  bound 
in  bits  per  symbol  for  a  prefix  free  code.  For  a  distri¬ 
bution  p  of  /symbols: 

Hs  =  Pi  1°s2(p) 

i 

H  uffman  coding  and  other  strategies  can  pro¬ 
duce  an  instantaneous  code  approaching  the 
entropy  when  the  distribution  p  is  known.  But  what 
isthe  best  encoding  possible  when  the  source  distri¬ 


bution  isnot  known?One  way  to  proceed  isto  mea¬ 
sure  the  empirical  entropy  of  the  string,  that  isthe 
entropy  defined  inherently  by  the  input  string  itself. 
However,  empirical  entropy  is  a  function  of  the  par¬ 
tition  and  depends  on  what  sub-strings  are  grouped 
together  to  be  considered  symbols.  See  for  a  consid¬ 
eration  of  some  of  the  inadequacies  of  the  well- 
known  Lempel-Zivalgorithmsin  dealing  with  higher 
order  empirical  entropies. 

Our  goal  isto  optimize  the  partition  (the  num¬ 
ber  of  symbols,  their  length,  and  distribution)  of  a 
string  such  that  the  compression  bound  for  an 
instantaneous  code,  which  is  equal  to  R*HS,  plusthe 
codebook  size  is  minimized  according  to  the  MM  L 
criteria.  We  estimate  the  codebook  size  (model 
descriptive  cost  M)  to  be  the  sum  of  the  lengths  of 
unique  symbols: 

M  =  £/, 

l 

Thus  we  estimate  the  total  descriptive  cost  Cp. 

Cp  =  M  +  R  ■  Hs 

Consider  for  now  that  all  symbols  are  equally 
likely  and  of  equal  length.  Thus: 

=  f  =  r.  h  =  \  =  /  and  Hs  =  log  2(I) 

describes  how  entropy  changes  as  unique  symbols 
are  added  to  the  partition;  each  added  symbol 
increasing  the  number  of  bits  required  to  encode 
each  symbol  in  a  less  than  linear  fashion.  This 
increased  descriptive  cost  per  symbol  must  be  traded 
against  symbol  length  and  number  of  repetitions. 

For  a  given  number  of  unique  symbols,  more  rep- 
etitionswill  at  first  tend  to  decrease  the  overall 
description  length,  since  the  fixed  length  string  of 
size  L  will  now  be  divided  into  shorter  wordsof  size  I 
and  the  codebook  for  the  string  will  now  be  shorter 
to  describe. 

The  description  length  decreases  (the  reduction 
in  model  size  dominates)  until  a  minimum  occurs 
where  the  benefit  from  a  decreased  codebook  size  is 
offset  by  the  fact  that  more  symbols  of  a  fixed  aver¬ 
age  encoded  length  (on  the  order  of  Hs)  must  be 
appended  to  the  description.  Figure  31  plots 
description  length  vs.  number  of  repeats  for  various 
size  equally  I ikely  alphabets  based  on  a  1024  bit 
string.  The  knee  in  the  curve  for  each  number  of 
symbols  represents  the  optimal  number  of  repeti- 
tionsfor  a  certain  symbol  alphabet  size. 


40 


2.6  Complexity  Estimation  Algorithms  for  Information  Assurance 


The  minimum  for  a  given  number  of  symbols  in 
the  data  can  be  calculated  as  follows: 

cp  =  ^it  +  R  log 2(/)  =  i j  +  R  log 2(i) 

i 

where  5 isthe  number  of  symbolscontained  in  the 
data  (Figure 40). 


Entropy  vs.  Number  of  Symbols 


Figure  40.  M  ore  equally  likely  symbols  in  a  partition 
cause  the  Entropy  to  increase  -  raising  the  bits  per 
symbol  descriptive  cost  in  a  less  than  linear  manner. 


It  can  be  easily  shown  that  for  a  given  number  of 
symbols/,  a  minimum  description  length  can  be 
expected  if  each  equally  likely  symbol  is  repeated  r 
times  where: 


a/1o§2(/) 

This  number  represents  the  optimal  tradeoff 
between  codebook  size  and  encoded  data  size  for  a 
given  string  partition  into  equally  likely,  equal 
length  symbols  (Figure  41). 

As  shown  in  Figure  41,  the  benefit  of  repeated 
patterns  in  fixed  size  data  is  overcome  more  quickly 
in  a  large  equally  likelyalphabet.  For  the  case  where 
the  entropy  is  less  than  one  (i.e.,  there  is  only  a  sin¬ 
gle  symbol)  the  benefit  increases  until  all  symbols 
are  repeats,  as  expected.  As  an  example  of  this  prin¬ 
ciple,  consider  a  1000  bit  string 

X  =101010101010101010101010101010101010101. . . 

There  are  many  ways  to  parse  thisstring.  As 
shown  in  Figure  42,  in  an  optimal  parsing  of  this 
string  with  a  two  symbol,  equal  length,  and  equally 
likelyalphabet,  each  symbol  would  repeat  about  20 
times.  The  optimal  descriptive  cost  for  this  partitions 
is  approximately  100  bits  =25  bits  to  describe  each 


Descriptive  Cost  of  1000  bit  string  vs.  Number  of  Repeats  for  various  size  symbol  alphabets 


Figure  41.  Symbol  length  and  number  of  repetitions  of 
equal  length  equally  likely  symbols  comprising  a  string  of 
finite  length  produce  competing  affects  in  total  string  de¬ 
scriptive  cost. 
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Figure  42.  Descriptive  costvs.  number  of  repeats  for 
two  symbol  partitions  of  the  1000  bit  string 
101010101... . 

codeword,  40  bits  to  encode  each  codeword  in  the 
string.  Note  that  we  have  ignored  "Comma"  cost, 
which  will  be  required  to  show  separation  of  code 
words  in  the  codebook.  The  codebook  listed  below 
achieves  this  partition: 

A=1010101010101010101010101 

B  =0101010101010101010101010, 

with  the  encoded  string  represented  byABABA- 
BAB...  20  times. 

Two  other  partitions  are  noted  in,  both  with  simi¬ 
lar  code  words  except  the  length  of  each  codeword 
isdifferent.  In  thefirstcase,  (represented  bya  -Fin 
Figure  42)  each  symbol  is  125  bits  long  and  repeated 
four  times.  I  n  the  second  case  each  symbol  is  5  bits 
long  and  repeated  100  times.  Both  cases  require  a 
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much  lar  ger  descriptive  cost  for  the  string.  I  n  the 
first  case  the  additional  burden  is  on  the  codebook 
that  must  describe  two  125-bit  symbols.  In  the  sec¬ 
ond  case  the  burden  is  on  the  encoding  of  the  data, 
which  must  describe  200  symbols  once  encoded.  As 
a  benchmark,  the  LZ78  can  encodethisstring  in  378 
bits. 

The  above  analysisshowsthat  when  partitioning  a 
string  both  length  of  symbols  and  number  of  repeti¬ 
tions  of  symbols  must  be  traded  off  and  optimized  in 
order  to  minimize  descriptive  length.  We  develop  a 
method  to  treat  non-uniform  distributions  in  the 
next  section. 

Symbol  compression  ratio 

In  seeking  to  partition  the  string  so  as  to  minimize 
the  total  string  descriptive  length  Cp,  we  consider 
the  length  that  the  presence  of  each  symbol  adds  to 
the  total  descriptive  length  and  the  amount  of  cover¬ 
age  of  total  string  length  L  that  it  provides.  As 
described  in  Section  3,  the  descriptive  cost  of  the 
model  is  on  the  order  of  the  sum  of  the  lengths  of 
unique  symbolsin  the  partition.  The  descriptive  cost 
of  the  encoded  data  is  on  the  order  of  R  HS,  where 
Hs  is  the  entropy  of  the  symbol  partition  and  R  the 
total  number  of  symbols  in  the  string.  The  probabil¬ 
ity  of  each  symbol,^,  is  a  function  ofthenumber  of 
repetitions  of  each  symbol: 


Thus,  we  have: 

ns  =  =  -sKG) 

i  i 

Since 


we  can  simplify  thisto: 


hs  = 

i 


Hs  is  a  measure  of  the  ability  to  encode  the  distri¬ 
bution  pofl  symbols.  The  smaller  Hs  the  fewer  bits 
per  symbol  required  to  encode  each  symbol.  For  a 
fixed  I,  Hs  is  maximized  when  all  /symbolsare 


equally  likely.  Total  number  of  symbols#  multiplied 
by  Hs  will  yield  the  length  required  to  encode  the 
data  using  an  optimal  technique.  This  can  be  added 
to  the  codebook  size  to  achieve  a  bound  for  the 
descriptive  complexity  under  partition  p 
(Figure  43). 


Rlog2(R) 

and 

Rl°g2(L/2) 


R 

Figure  43.  Estimate  of  Rlog2(R) . 

Thus,  descriptive  length  of  the  string  under  parti¬ 
tion  p  is  equal  to: 

Cp  =  Rlog2(R)  +  2li-ri'°S2(ri) 

i 

Our  goal  isto  find  a  per  symbol  descriptive  cost. 

In  the  equation  above,  all  terms  are  defined  per  sym¬ 
bol  i  with  the  exception  of  the  first  term.  We  define 
the  relation: 

Rlog^iR)  =  ^V;  log2(tf)=  q^r( 

i  i 

where  q  isa  constant  estimating  log2(R).  For  Rthat 
can  vary  between  2  and  L/2  for  symbols  of  size  2  bits 
or  greater,  log2(R)  can  be  estimated  to  enable  an 
incremental,  per  symbol  formulation  for  Cp.  Estimat¬ 
ing  q- 


results  in  a  conservative  approximation  for  Rlog2(R) 
over  the  likely  range  of  R  as  shown  in  for  partitions 
of  strings  having  length  equal  to  1000  bits.  The  per- 
symbol  descriptive  cost  can  now  be  formulated: 

ci  =  '•J(l0§2(f)-l0§2(r,)]  +  /i 
c  O[log2(f)  -  l0S 2^ri)\  +  li 
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C  p  is  less  than  the  sum  of  the  individual  code 
words  due  to  the  conservative  approximation  for 
log2(R).  A  lower  bound  on  the  estimate  for  d  can  be 
formed  as  well  to  create  upper  and  lower  boundson 
the  descriptive  length  Cp: 

cmin,  =  r,l0g20,)+  log2(/',)=  /,- 

^  '^cminI  ~  cp  ~  ^ci 
i  i 

This  bound  can  be  tightened,  as  better  estimates 
of  the  total  number  of  repetitions  in  a  partition 
become  known. 

We  now  have  a  metric  that  conservatively  esti¬ 
mates  the  descriptive  cost  of  any  possible  symbol  in  a 
string.  A  measure  of  the  compression  ratio  for  a  par¬ 
ticular  symbol  is  simply  the  descriptive  length  of  the 
string  divided  by  the  length  of  the  string  "covered" 
by  thissymbol.  We  define  the  compression  ratio  of  a 
symbol  (SCR)  to  be  (Figure  44): 


SCR  vs.  Symbol  Length  for  Various  Number  of  Repeats 


Figure  44.  SCR  vs.  Symbol  Length  for  1024-bit  String. 


Thus  we  have  a  metric  to  describe  the  effective¬ 
ness  for  compression  of  a  particular  candidate  sym¬ 
bol  in  a  possible  partition  of  a  string  that  can  be 
used  for  comparison  in  forming  a  partition  Examin¬ 
ing  SCR  above  it  is  clear  that  good  symbol  compres¬ 
sion  ratio  arises  in  general  when  symbols  are  long 
and  repeated  often.  Clearly,  selection  of  some  sym¬ 
bols  as  part  of  the  partition  is  preferred  to  others. 

Figure  44  and  Figure  45  show  how  symbol  com¬ 
pression  ratio  varies  with  the  length  of  symbols  and 
number  of  repetitionsfor  a  1024  bit  string.  I  n  both 
fi  gu  res  th  e  d  i  sco  n  ti  n  u  i  ti  es  refl  ect  wh  en  sym  bo  I 


length  times  number  of  repeat  exceeds  string 
length,  and  SCR  istherefore  undefined  (Figure  45). 


SCR  vs.  Number  of  Repeats  for  Various  Symbol  Lengths 


Figure  45.  Figure  16  SCR  vs.  Repeats  for  1024  bit  String. 


Optimal  Symbol  Compression  Ratio  (OSCR) 
algorithm 

TheOptimal  Symbol  Compression  Ratio  (OSCR) 
algorithm  forms  a  partition  of  string  x  into  symbols 
that  have  the  best  symbol  compression  ratio  among 
possible  symbols  contained  in  x.  The  concept  isto 
form  a  codebook  dictionary  that  provides  near  opti¬ 
mal  compression  by  adding  one  codeword  at  a  time 
based  on  the  code  words  symbol  compression  ratio. 
The  algorithm  is  shown  in  the  sidebar. 

OSCR  ALGORITHM 

1.  Form  a  binarytree  of  all  non-overlapping 
sub-strings  contained  in  x  that  occur'  2 
times  and  note  the  frequency  of  occur¬ 
rence. 

2.  Calculate  the  SCR  for  all  nodes  (sub¬ 
strings).  Select  the  sub-string  from  this  set 
with  the  smallest  SCR  and  add  it  to  the 
model  M . 

3.  Replace  all  occurrencesof  the  newly  added 
symbol  with  a  unique  character  to  delineate 
thissymbol.  Repeat  steps  1  and  2  with  the 
remaining  binarystring  elementsuntil  no 
binary  elements  remain. 

4.  When  a  full  partition  has  been  constructed, 
use  FI  uffman  coding  or  another  coding 
strategy  to  encode  the  distribution,  p,  of 
symbols. 
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The  following  comments  can  be  made  regarding 
this  algorithm: 

•  Thisalgorithm  progressively  adds  symbols  that 
do  the  most  compression  "work"  among  all  the 
candidates  to  the  code  space.  Replacement  of 
these  symbols  left-most-first  will  alter  the  fre¬ 
quency  of  remaining  symbols. 

•  SCR  is  at  first  based  on  the  crude  estimate  for 
itdiscussed  previously.  Justification  of  this  esti¬ 
mate  and  possible  iteration  of  the  algorithm 
can  be  achieved  by  performing  the  algorithm 
again  upon  completion  with  the  computed 
value  for  it  defined  by  the  partition  calculated 
by  the  algorithm.  An  unchanged  partition  vali¬ 
dates  the  output  of  the  algorithm.  If  the  parti¬ 
tion  does  change,  the  algorithm  can  be  iterated 
using  computed  values  of  R  until  it  converges 
or  repeats  a  previous  case. 

•  A  less  exhaustive  search  for  the  optimal  SCR 
candidate  is  possible  by  concentrating  on  the 
tree  branches  that  dominate  the  string. 

•  The  algorithm  does  not  require  a  prefix  free 
partition  of  the  string.  The  left-most  substitu¬ 
tion  of  highest  SCR  symbolsfirst  suffices  to  pro¬ 
duce  a  unique  partition.  A  prefix  free  code  is 
however  assumed  in  the  encoding. 

•  Reduction  of  the  binary  tree  size  can  be 
achieved  by  noting  minimum  SCR  at  each  level 
and  considering  boundsfrom  tree  nodes 
(Figure  46). 


Figure  46.  Binary  Tree  for  a  specific  string.  Nodes  in¬ 
cluded  are  in  white. 


As  an  example  consider  the  40-bit  string  below: 

X  =0011001001000010100101001100100110011001. 

A  full  five  level  binary  tree  expansion  of  all  sub 
strings  contained  in  thisstring  doesnot  include  all 
possible  nodes.  Rather,  only  certain  possible  pat¬ 
terns  are  contained  in  any  partition,  identifies  the 
possible  sub  strings  that  occur.  The  first  passof  the 
OSCR  algorithm  will  produce  the  binary  tree  of  sym¬ 
bol  frequencies  shown  in  Figure  47.  Note,  in  build¬ 
ing  this  tree  we  noted  and  utilized  the  fact  at  the 
second  level  of  the  tree  the  SCR  of  12  repetitions  of 
the  symbol  01  is  <0.5,  thus  we  did  not  expand  tree 
nodes  with  two  or  less  repeats. 

As  shown  in  Figure  47,  the  symbol  001  is  repeated 
10  times  and  has  the  smallest  symbol  compression 
ratio.  Substituting  "A"  for  thissymbol  produces  the 
string: 

X'  =A1AA00A01A01A1AA1A1A  (Figure 47) 


Figure  47.  Binary  Pattern  Tree  in  first  pass  of  algorithm. 


I  terati  n  g  th  e  al  go  r i  th  m  sh  o ws  th  at  th  e  seco  n  d 
symbol  candidate,  01  that  has  the  smallest  compres¬ 
sion  ratio,  does  not  promote  compression.  Thusthe 
remaining  symbols  simply  substitute  for  1  and  0: 

ABAACCACBACBABAABABA 

This  provides  the  distribution  of  symbols  shown 
in  Figure  4.  The  entropy  of  thissymbol  distribution 
is  1.48  bits  per  symbol.  This  can  be  approximated  by 
the  H  uffman  tree  shown  in  Figure  48,  which 
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Figure  48.  Huffman  Tree 
for  Symbol  Partition. 


achieves  an  expected  encoded  length  of  1.5  bits  per 
symbol. 

Table  4  Symbol  Distributions 


Substring 

Symbol 

Probability 

New  Code 

001 

A 

0.5 

0 

1 

B 

0.3 

10 

0 

C 

0.2 

11 

Simple  substitution  of  the  new  code  results  in  the 
encoded  string: 


X '  =010001111011100111001000100100: 


Thus,  the  encoded  message  has  been  reduced  to 
30  bits  from  the  original  40  bits.  The  descriptive  cost 
of  the  codebook  isestimated  asthesum  of  the 
lengths  of  symbols,  which  is  equal  to  5  bits.  Depend¬ 
ing  on  the  strategy  for  delineating  the  separation 
between  code  words  and  defining  the  prefix  free 
encoding  of  the  codebook  this  descriptive  cost  could 
vary.  Estimated  results  compared  with  LZ78for  sev¬ 
eral  short  strings  are  shown  in  Table  5. 


Table  5  Encoded  Lengths  for  several  short  strings. 


String 

LZ78 

OSCR 

Model 

0100101101001011010010110 

40 

0.20 

010,11,0 

1010101010101010101010101 

30 

12 

1010,1 

1110110111101101110111101 

30 

20 

101,11,1 

The  previous  example  illustrates  the  concept  of 
the  OSCR  algorithm.  As  isthe  case  with  Lempel-Ziv 
and  other  compression  algorithms,  greater  compres¬ 
sion  is  realized  on  strings  of  longer  length.  In  addi¬ 
tion  to  compression,  the  algorithm  provides  the 
following  benefits: 

•  The  model  developed  can  be  used  to  produce  a 
typical  set  of  strings  to  which  x  belongs. 

•  The  symbol  alphabet  size  of  3  symbols  is  an 
inherent  parameter  associated  with  this  string 
that  can  be  used  to  compare  it  with  other 


strings.  The  symbol  size  measurable  parameter 
related  to  complexity  that  reflects  the  number 
of  variables  address  by  the  string. 

Comparison  with  Lempel-Ziv78 

TheOSCR  and  LZ78algorithmsshare  the  approach 
of  dictionary  coding  strategies,  achieving  compres¬ 
sion  through  giving  smaller  representations  for 
longer  repeated  strings.  The  difference  isth at  the 
string  patterns  identified  and  indexed  in  LZ78  are 
precisely  the  unique  string  patternsthat  occur  from 
left  to  right  that  have  not  been  seen  before.  No 
effort  is  made  to  construct  a  partition  of  repeated 
patterns  that  gives  a  more  optimal  encoding  than 
that  which  falls  out  of  the  patterns  or  string  phrases 
that  occur  first.  In  most  implementations  all  sub¬ 
strings  are  given  an  equal  size  codeword  (index), 
therefore  a  frequently  occurring  short  codeword 
may  actually  be  expanding  the  size  of  the  encoded 
string.  The  benefit  of  the  Lempel-Ziv  approach  is 
the  computational  simplicity  and  ease  with  which 
the  dictionary  or  codebook  is  communicated.  The 
dictionary  is  essentially  interleaved  in  the  encoded 
data  and  commas  or  explicit  communication  of  the 
codebook  is  not  required. 

TheOSCR  takes  the  other  extreme  by  identify¬ 
ing  the  repeated  patternsthat  contribute  most  to 
compression  of  the  string  at  the  expense  of  compu¬ 
tational  requirements.  One  can  envision  a  combina¬ 
tion  of  these  two  philosophies  that  will  be  addressed 
in  future  work  that  provides  a  continuum  of  grada¬ 
tion  between  compression  gain  and  computational 
requirements. 

Comparison  of  estimators  for  detection  of  FTP 
exploits 

The  goal  of  the  OSCR  algorithms  isto  improve  com¬ 
plexity  estimation  in  a  manner  that  provides  the  abil¬ 
ity  to  discern  attack  vs.  healthy  behaviors.  Results 
from  Figure  172indicate  a  separation  of  curves  for 
attack  vs.  heal  thy  ftp  traffic.  Widening  these  curves 
will  result  in  better  ability  to  discern  exploits  with 
fewer  false  alarms. 

Figure  49  shows  the  difference  in  complexity  esti¬ 
mation  provided  by  various  complexity  estimators 
healthy  session  and  an  attack  session  trace  files  of 
about  2kbits.  As  shown  in  the  figure,  empirical 
entropy  and  straight  LZ78  estimation  incorrectly  dis¬ 
cern  healthyfrom  attack  behavior.  OSCR  widensthe 
curve  over  Zip  compress (Zlib),  providing  a  better 
margin  for  error  in  discerning  attack,  despite  the 
fact  that  Zip  compress  provides  a  better  compressor 
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Figure  49.  Comparison  of  OSCR  vs.  Zip  Compress  for  FTP  data. 

asshown  in  Figure  50.  Further  data  must  betaken  to 
validate thisgain.  Additional  gainsin  OSCR  com¬ 
pression  can  bemadethrough  optimizing  the  model 
cost  beyond  simple  sum  of  the  codeword  length 
(Figure  50). 

Com  parison  of  OSCR  vs.  Zip  com  press 


12  3  4 


test  case 


Figure  50.  Comparison  of  OSCR  vs.  Zip  Compress  Compres¬ 
sion  Ratio. 


very  flexible.  For  example,  these  techniques  will  fail 
if  a  new  type  of  packet  is  used  for  attack  or  if  the 
attack  consists  of  a  traffic  pattern  that  is  a  combina¬ 
tion  of  ICMP  and  UDP  packets.  In  such  cases,  packet 
profiling  is  defeated.  This  report  describes  an 
approach  based  on  fundamentals  of  information 
complexity  that  is  both  flexible  and  effective. 

Stated  as  simply  and  succinctly  as  possible,  we 
hypothesize  that  information,  comprising  observa¬ 
tions  of  actions  with  a  single  root  cause,  whether 
they  are  faults  or  attacks,  is  highly  correl  ated.  Highly 
correlated  data  has  a  high  compression  ratio.  The 
KolmogorovComplexity,  K(x),  of  a  string  of  data 
measures  the  size  of  the  smallest  program  capable  of 
representing  the  given  piece  of  data  [10].  It  mea¬ 
sures  the  degree  of  randomness  for  the  given  data. 
The  length  of  the  shortest  program  to  generate  a 
completely  random  string  is  equal  to  the  size  of  the 
string  itself.  For  all  other  cases,  it  issmaller  than  the 
size  of  the  string  and  the  program  size  becomes 
smaller  as  more  regularity  or  pattern  is  discernible 
from  the  string.  A  side  effect  of  this  measure  is  its 
abilityto  represent  the  correlation  between  dispar¬ 
ate  p  i  eces  of  d  ata.  T  h  i  s  si  d  e  effect  i  s  exp  I  o  i  ted  to 
design  an  effective  method  for  detecting  DDoS 
attacks  (Figure  51). 


Magician  Node 


2.7  DETECTING  DISTRIBUTED  DENIAL-OF- 
SERVICE  ATTACKS  USING  KOLMOGOROV 
COMPLEXITY  METRICS 

Distributed  denial-of-service  attacks  are  caused  by 
the  attacker  flooding  the  target  machine  with  a  tor¬ 
rent  of  packets  originating  from  a  number  of 
machinesunder  the  attacker's  control.  These 
mach  i  n es  are  cal  I ed  ' zo m b i es' .  T  h e  attacker  typical  ly 
uses  I  CM  P  or  UDP  packets  for  the  attack.  Typical 
detection  techniques  [156, 157]  for  these  types  of 
attacks  rely  on  filtering  based  on  packet  type  and 
rate.  Essentially,  the  detection  software  attempts  to 
correlate  the  type  of  packet  used  for  the  attack,  be  it 
ICMP  or  UDP,  with  the  destination.  While  these 
techniques  have  reasonable  success,  they  are  not 


Figure  51.  Implementation  in  Magician  Active  Node. 

Approach 

TheDDoS  attack  detection  algorithm  makesuseof  a 
fundamental  theorem  of  KolmogorovComplexity 
that  states:  for  any  two  random  stringsX  and  Y, 

K(XY)  <=  K(X)  +  K(Y), . (1) 

where  K(X)  and  K(Y)  are  the  complexities  of  the 
respective  strings  and  K(XY)  isthe  joint  complexity 
of  the  concatenation  of  the  strings.  Simply  put,  the 
joint  Kolmogorov  complexity  of  two  strings  is  less 
than  or  equal  to  the  sum  of  the  complexities  of  the 
individual  strings.  The  equivalence  holds  when  the 
two  stringsX  and  Fare  totally  random  i.e.  they  are 
completely  unrelated  to  each  other.  Another  effect 
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2.7  Detecting  Distributed  Denial-of-Service  Attacks  using  Kolmogorov  Complexity  Metrics 


of  this  relationship  isthat  the  joint  complexity  of  the 
strings  decreases  as  the  correlation  between  the 
strings  increases.  Intuitively,  if  two  strings  are 
related,  they  share  common  characteristics  and  thus 
common  patterns,  That  knowledge  can  be  har¬ 
nessed  to  generate  a  smaller  program  that  can  repre¬ 
sent  the  combined  string. 

The  concept  of  "Conservation  of  Complexity” 
was  introduced  in  [210],  Thisconcept  relates  to  the 
ability  to  discern  an  attack  by  monitoring  the  com¬ 
plexity  change  due  to  processes  occurring  in  the  sys¬ 
tem  and  imposing  boundsthat  identify 
unauthorized  processes— noted  by  complexity 
changes  that  are  either  too  great  or  too  small  to  be 
from  authorized  processes.  Figure  52  describes  the 


K  Complexity  Increase  Due  to  a  Process  is  Bounded 


1  KAmindt<j  Kdt< j  KAmaxdt 

t  t  t 


Figure  52.  Principle  of  conservation  of  com¬ 
plexity. 

concept  of  conservation  of  complexity.  Thisconcept 
was  first  applied  to  a  closed  system,  where  the  pro¬ 
cesses  are  known  and  able  to  be  monitored  by  com¬ 
plexity  probes.  In  the  distributed  case  of  a  denial  of 
service  attack,  the  process  is  not  known,  but  bounds 
on  the  differential  complexity  allowed  by  the  distrib¬ 
uted  processes  are  still  able  to  be  enforced. 

The  above  given  by  inequality  (1)  isexploited  to 
distinguish  between  concerted  denial-of-service 
attacks  and  cases  of  traffi c  o ver I oad .  T  h e  assu m pti o n 
isthat  an  attacker  performs  an  attack  using  large 
numbers  of  similar  packets  (in  terms  of  their  type, 
destination  address,  execution  pattern  etc.)  sourced 
from  different  locations  but  intended  for  the  same 
destination.  Thus,  there  isa  lot  of  similarity  in  the 
traffic  pattern.  A  Kolmogorov  complexity  based 
detection  algorithm  can  quickly  identify  such  a  pat¬ 
tern.  On  the  other  hand,  a  case  of  legitimate  traffic 
overload  in  the  network  tendsto  have  many  differ¬ 
ent  traffic  types.  The  traffic  flows  are  not  highly  cor¬ 
related  and  appear  to  be  random.  Therefore,  our 
algorithm  samples  everydistinct  flow  of  packets(dis- 
tinguished  by  their  source  and  destination 
addresses)  to  determine  if  there  isa  large  amount  of 


correlation  between  the  packets  in  a  flow.  If  it  is 
determined  to  be  so,  then  all  suspiciousflowsatthe 
node  are  again  correlated  with  each  other  to  deter¬ 
mine  that  it  is  indeed  an  attack  and  not  a  case  of  a 
traffic  overload. 

The  architecture  for  DDoS  detection  has  been 
implemented  in  an  active  network  for  ease  of 
deployment  and  flexibility  in  testing.  As  shown  in 
Figure  53,  it  consists  of  a  packet  complexity  probe 


Figure  53.  DDoS  detection  architecture. 


(described  in  detail  in  the  next  section)  associated 
with  every  traffic  flowthrough  a  node  that  periodi¬ 
cally  samples  packets  in  the  flow.  For  the  collected 
sample,  the  probe  calculates  the  complexity  differ¬ 
ential  for  the  sample.  Complexity  differential  is  defined 
as  the  difference  between  the  cumulative  complexities  of 
individual  packets  and  the  total  complexity  computed  when 
those  packets  are  concatenated  to  form  a  single  packet.  If 
packets x;,  x2,  x3,...,xn  have  complexities  K(xfi, 
K(x2),  K(x3),...,  Kixj,  then  complexitydifferential  is 
computed  as: 

[K(xfi  +  K(xfi  +  K(xfi  +...+  K(xn)]  -  K(x jXyXj. . .xn), 

where  K(x1x2x3...xn)  is  the  complexity  of  the  packets 
concatenated  together.  If  packets xh  x2,  x3,...,xn are 
completely  random,  K(xjx2x3...xj  will  be  equal  to 
the  sum  of  the  individual  complexities  and  the  com¬ 
plexitydifferential  will  therefore  be  zero.  H  owever,  if 
the  packets  are  highly  correlated  i.e  some  pattern 
emerges  in  their  concatenation,  then  the  concate¬ 
nated  packet  can  be  represented  by  a  smaller  pro¬ 
gram  and  hence  its  complexity  i.e.,  K(x1x2x3...xn) 
will  be  smaller  than  the  cumulative  complexity.  In 
effect,  we  use  the  measure  of  the  compressibility  of 
the  packets  accumulated  in  a  given  time  interval  to 
determine  correlation.  If  the  complexitydifferential 
is  greater  than  a  preset  threshold  for  the  flow,  the 
flow  is  marked  as  suspect  and  the  collected  sample  is 
referred  to  a  Local  Detector  running  on  the  node. 
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2.  Discussion 


The  Local  Detector  receives  all  such  samples 
from  varioussuspiciousflowsand  correlates  all  the 
samples  together  using  the  same  complexity  differ¬ 
ential  calculation.  If  there  is  only  one  suspect  flow, 
no  correlation  is  performed.  If  the  complexity  dif¬ 
ferential  again  exceeds  the  threshold,  all  suspect 
flows  (including  the  case  of  a  single  flow)  are 
referred  to  a  Domain  Detector  that  is  running  on 
some  other  node  on  the  local  network  domain.  This 
h i erarch y  of  d etecto rs coo perates to  d etect  distrib¬ 
uted  denial  of  service  attacks  in  the  network  itself. 
This  hierarchy  is  shown  in  Figure  53. 

Complexity  estimates 

While  it  is  known  that,  in  general,  Kolmogorov  com¬ 
plexity  is  not  computable,  various  methods  exist  to 
compute  estimates  of  the  complexity.  The  packet 
complexity  probe  described  in  the  previous  section 
uses  an  entropycalculation  technique  for  estimation 
of  complexity.  The  KolmogorovComplexity  estima¬ 
tor,  currently  implemented  as  a  simple  compression 
estimation  method,  returns  an  estimate  of  the  small¬ 
est  compressed  size  of  a  string.  The  complexity  K(x) 
is  computed  using  the  entropy  H(p)  of  the  weight  of 
ones  in  a  string.  Specifically,  K(x)  isdefined  in  Equa¬ 
tion  l.A  where  isthe  number  of  1  bits  and  isthe 
number  of  0  bits  in  the  string  whose  complexity  isto 
be  determined.  Entropy  H(p)  isdefined  in  Equation 
l.B.  The  expected  complexity  is  asymptotically 
related  to  entropy  as  shown  in  Equation  l.C.  See 
[10]  for  other  measures  of  empirical  entropy  and 
their  relationship  to  Kolmogorovcomplexity. 

(SW  -  +  'ogjI'M)  '-A 

H(p)  =  -p\og2p-(\.ti-p)\og2p-(\.0-p)  l.B 
H(X)~  V  P(X  =  x)K(X)  l.C 

l{x)  =  n 

The  complexity  estimation  technique  used  here  is 
not  the  best  because  empirical  entropy  is  actually  a 
very  poor  method  of  complexity  estimation.  For 
example,  the  estimate  for  the  string 

101010101010101010101 

and  a  completely  random  string  with  equal  numbers 
of  l'sand  0's  isthe  same  under  empirical  entropy. 
More  accurate  estimates  for  complexity  will  only 


serve  to  improve  our  method  for  DDoS  detection. 
See[162]  for  an  innovative  and  improved  method 
for  complexity  measurement.  In  future  work,  this 
technique  will  be  used  in  the  complexity  probe  and 
the  performance  of  the  algorithm  will  be  compared 
with  respected  to  the  two  techniques. 

Experimental  results 

We  compared  our  technique  to  a  prototype  packet 
counting  algorithm  for  DDoS  detection  and  found 
that  our  technique  is  better  discriminates  traffic  pat¬ 
terns.  We  used  our  Magician -based  [159]  active  net¬ 
work  [  160]  test  bed  for  the  experiment  for  two 
reasons.  It  isquite  easy  to  set  up  a  desired  topology 
for  the  network,  as  well  as  control  and  measure  per¬ 
formance  using  an  active  network.  Secondly,  it  is  eas¬ 
ier  to  embed  our  complexity  probes,  which  are 
written  in  Java,  inside  thejava-based  Magician  ker¬ 
nel  as  opposed  to  embedding  them  inside  commer¬ 
cial  routers.  The  results,  however,  can  be 
extrapolated  to  real  traffic  settings  (Figure  54). 


Figure  54.  Topology  for  experiment. 


The  experimental  setup  consisted  of  a  set  of 
active  nodes  arranged  in  the  topology  shown  in. 
Node  AH -1  continuously  generates  traffic  consisting 
of  audio  packets  destined  for  node  AN -2.  The  load 
induced  by  this  traffic  is  high  enough  that  it  is  regis¬ 
tered  at  node  AN -1  as  a  'suspicious'  flowi.e.  a  traffic 
flow  whose  complexity  differential  exceedsthe 
threshold.  The  load  induced  by  this  traffic  flow  is 
kept  constant  throughout  the  experiment.  Node 
AH  -2  generates  the  attack  flow.  The  load  induced  by 
the  attack  flow  is  varied  to  determine  the  perfor¬ 
mance  of  the  algorithms.  The  experiment  is  run 
twice,  once  with  only  the  attack  source  on  ( node  AH - 
2  transmitting  only)  and  the  next  time  with  both 
sources  on  ( both  node  AH  -1  and  node  AH  -2  trans¬ 
mitting).  The  rationale  isthat  an  attack  is  essentially 
a  sustained  overload  induced  for  sometime  interval. 
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The  purpose  of  the  experiment  isto  determine  the 
effectiveness  of  the  two  techniques  in  separating  and 
identifying  an  attack  in  the  presence  of  background 
traffic. 

Figure  55  and  Figure  34  showthe  performance  of 
the  packet-counting  and  complexity-based 
approaches  as  measured  against  the  load  induced  by 
the  two  sources  (in  packets  per  second)  described 
above.  Figure  55  shows  that  the  packet-counting 
metric  cannot  discriminate  between  an  attack  and  a 
true  overload.  When  the  audio  source  ^transmit¬ 
ting  in  conjunction  with  the  attack  source,  any 
threshold  set  bythe  packet-counting  algorithm  run¬ 
ning  on  node  AN-1  will  be  exceeded  leading  to  the 
false  conclusion  that  the  node  is  under  attack.  For 
example,  based  on  the  attack  pattern  only  (blue 
curve),  we  decide  to  set  the  threshold  at  70  pack¬ 
ets/  sfor  a  load  of  0.6.  When  the  audio  source  is 
introduced,  the  combined  traffic  trips  the  same 
threshold  at  a  load  of  only  0.4,  which  is  a  false  posi¬ 
tive.  Figure  54  below  shows  the  complexity  differen¬ 
tial  versus  load  curve  for  a  given  sampled  time 
interval,  which  in  this  case  was  10  seconds.  The  com¬ 
plexity-based  metric  does  not  change  its  behavior 
when  a  combination  of  attack  and  traffic  sources  is 


Figure  55.  Performance  of  packet-counting  metric. 

u  sed .  T  h  i  s  i  s  because  th  e  attack  traffi  c  do  m  i  n  ates  th  e 
combined  flow  and  hence  the  complexity  differen¬ 
tial  roughly  equal  to  that  observed  when  only  the 
attack  fl  o  w  exi  sted .  T  h  erefo  re,  th  e  co  m  p  I  exi  ty-based 
approach  is  more  accurate  in  separating  false  alarms 
from  true  attacks  because  it  can  conserve  salient  pat¬ 
ter  ns  of  a  traffic  flow. 
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3.  Kolmogorov  Complexity  as  a  Fundamental 
M  etric  Enabling  Vulnerability  Analysis 


3.1  AUTOM  ATED  DISCOVERY  OF 

VULNERABILITIES  WITHOUT  A  PRIORI 
KNOWLEDGE  OF  VULNERABILITY  TYPES 

The  design  of  the  vulnerability  analysis  tool  consists 
of  three  logical  layers  as  shown  in  Figure  56.  Corn- 


Figure  56.  Logical  view  of  complexity-based  vulnerability 
analysis  process. 


plexity  measurement  probes  within  the  actual  system 
form  the  first  layer.  The  results  from  the  probes  are 
used  to  build  a  K-Map.  The  K-Map  consists  of 
acx  c^k\l  matrix,  where  Cistheset  of  informa¬ 
tion  system  components.  The  matrix  represents  a 
complete  representation  of  "attack"  components 
crossed  with  target  components.  The  diagonal  val¬ 
ues  are  zero  because  the  complexity  of  a  compo¬ 
nent,  assuming  the  component  has  already  been 
compromised,  iszero.  Components  that  cannot 
physically  be  accessed  from  another  component 
have  an  infinite  complexity  value.  Note  that  the  K- 
M  ap  values  change  as  an  attack  progresses.  The  com¬ 
plexity  values  are  updated  using  conditional  Kol- 
mogorovComplexity  estimates  from  Equation  (1.2). 
This  updates  in  security  flow  given  that  an  attacker 
has  partially  penetrated  the  system  and  has  gained 
knowledge  of  the  compromised  components.  The 
result  of  this  matrix  can  be  viewed  as  a  complexity 
surface  as  shown  in  [57],  The  top  layer  consists  of 
vulnerability  states  and  transition  values  obtained 


from  layer  two.  Relative  complexity  estimates  are 
used  to  quantify  the  resistance  to  attack  along  the 
edges  of  the  graph  and  the  nodes  are  the  state  of  an 
attack. 

A  model  information  system  has  been  imple¬ 
mented  in  Mathematica  [102].  Mathematica  pro¬ 
vides  an  ideal  environment  for  experimenting  with 
symbolic  mathematical  concepts  and  algorithmic 
information  theory  in  general.  The  goal  isto  deter¬ 
mine  the  vulnerability  not  only  of  the  overall  system, 
but  also  of  system  components.  Vulnerability  analysis 
must  be  possible  without  a  priori  knowledge  about 
system  operation  or  knowledge  of  particular  types  of 
vulnerabilities.  Expert  systems  and  vulnerability 
analysis  tools  that  rely  upon  rules  identifying  partic¬ 
ular  types  of  vulnerabilities  are  inherently  brittle. 
Such  tools  provide  good  performance  when  known 
attacks  are  applied,  however,  they  fail  catastrophi¬ 
cally,  and  are  therefore  useless,  against  an  innovative 
attacker. 

Every  information  system  is  assumed  to  take  data 
of  some  form  as  input,  process  the  data  and  return 
data  as  output.  Every  information  system  can  be 
defined  asa  mathematical  operation.  Information 
systems  developed  by  humans  today  tend  to  be 
highly  structured  in  order  to  be  tractable  in  their 
development  and  maintenance.  Generally,  there  are 
well-defined  data  flows  and  processing  functions 
within  the  information  system.  The  system  is  com¬ 
posed  of  a  hierarchical  composition  of  functional 
units.  For  these  systems,  one  can  imagine  complexity 
probes  located  at  the  input  and  output  of  every 
functional  unit  in  the  system.  This  allows  determina¬ 
tion  of  the  vulnerability  of  each  process  and  data 
stream  at  a  high  degree  of  granularity.  This  provides 
a  complexity-based  vulnerability  map  for  the  system. 
A  potential  attacker  would  be  unlikely  to  have  such  a 
detailed  understanding  of  a  target  information  sys¬ 
tem.  An  optimization  to  thistechnique  isto  limit 
probe  locationsto  only  those  locations  likely  to  be 
observable  to  an  attacker. 

System  under  evaluation:  the  active  network 

I  n  the  remainder  of  this  paper,  a  specific  example  is 
used  to  communicate  the  architecture  and  opera¬ 
tion  of  the  vulnerability  analysisframework.  The  spe- 
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cific  example  focuses  upon  an  active  network  [  102] 
in  which  a  distinction  is  made  between  an  active  net¬ 
work  and  a  legacy,  or  passive,  network.  This  environ¬ 
ment  is  used  to  emphasize  that  information 
assurance  laws  must  be  able  to  deal  with  many  alter¬ 
native  and  dynamically  changing  representations  of 
information. 

With  regard  to  active  packets  and  information 
theory,  passive  data  is  simple  compressible  data; 
active  packets  are  a  combination  of  data  and  pro¬ 
gram  code  whose  efficiency  can  be  estimated  by 
meansof  KolmogorovComplexity.  A  brief  concep¬ 
tual  view  of  KolmogorovComplexity  for  active  net¬ 
work  packet  optimization  is  demonstrated  in 
Figure  57  in  which  the  same  information  is  pre¬ 


information  Density 


Length  (bytes) 

Figure  57.  Same  active  packetinformation;  varying  hypothe¬ 
ses  (proportion  of  code  to  data). 

sented  with  varying  proportions  of  code  to  data.  The 
length  of  the  information  varies  with  the  hypotheses 
used  to  represent  the  information  within  the  packet. 
The  shortest  possible  representation  isthe  estimate 
of  the  packet  complexity  [  109] .  The  active  network 
Kolmogorov  Complexity  estimator  is  currently 
implemented  as  a  quick  and  simple  compression 
estimation  method.  The  KolmogorovComplexity 
estimator  returns  an  estimate  of  the  smallest  com¬ 
pressed  size  of  a  string.  It  is  based  upon  computing 
the  entropy  of  the  weight  of  one  bits  in  a  binary 
string.  Specifically  it  isdefined  in  Equation  6  where 
x#l  isthe  number  of  1  bits  and  x#0  isthe  number  of 
Obits  in  the  string  whose  complexity  isto  be  deter¬ 
mined.  Entropy  isdefined  in  Equation  7.  See  [103] 
for  other  measures  of  empirical  entropy  and  their 
relationship  to  KolmogorovComplexity.  The 


expected  complexity  is  asymptotically  related  to 
entropy  as  shown  in  Equation  8. 

Observe  an  input  sequence  at  the  bit-level  and 
concatenate  with  an  output  sequence  at  the  bit-level. 
This  input/  output  concatenation  is  observed  for 
either  the  entire  system  or  for  components  of  the 
system.  Low  complexity  input/  output  observations 
quantify  the  ease  of  understanding  by  a  potential 
attacker.  Previous  work  has  demonstrated  the  use  of 
KolmogorovComplexityfor  Distributed  Denial  of 
Service  (DDoS)  attack  detection  [104],  Definition  6 
explicitly  states  the  meansof  measuring  the  com¬ 
plexity  of  a  system  component,  or  protocol  interac¬ 
tion,  to  a  potential  attacker. 

(hx)  -  +  ,oe2(/M>  (6) 

H(p )  =  -p\og2p-(\.0-p)\og2p-(\.ti-p) 

H(X)~  V  P(X=x)K(X)  (8) 


Definition  6: 
Complexity- 
based 

Vulnerability 

Metrics 


Vulnerability  is  inversely  proportional 

to  K(x[ Opstart:Opend ] )/l( x[ Opstart:Opend ] ) 
where  Opstart  is  the  bit  at  which  an 
operation  to  be  discovered  within  an 
information  system  begins,  and  Opend  is 
the  last  bit  in  an  attacker’s  observation 


In  the  remainder  of  the  paper,  excerpts  from  a 
Mathematica  Notebook  are  included.  The  excerpts 
contain  code  using  common  mathematical  and  pro¬ 
gramming  constructs,  and  therefore  should  be  intu¬ 
itively  obvious  without  requiring  knowledge  specific 
to  Mathematica.  Any  Mathematica  specific  details 
are  explained  in  the  text.  Asa  specific  example  of 
the  algorithmic  capabilities  of  active  networks,  con¬ 
sider  the  transmission  of  an  estimate  of  n.  Onecould 
choose  to  send  ji  as  an  extremely  large  number  of 
digits.  Or  in  contrast,  one  could  send  a  smaller  algo¬ 
rithm  capable  of  generating  it  to  an  arbitrary  num¬ 
ber  of  digits.  Consider  an  illustration  of  thisconcept 
in  more  detail.  The  Mathematica  code,  {{#1/  #2  &}, 
(22. ,7.}}  represents  an  unnamed  function  that 
divides  the  first  argument  by  the  second  argument; 
the  function  implements  22/  7.  Consider  that  the 
code  ({#1/  #2 &})  and  the  data  (-|22.,7.))  remain 
unevaluated  and  are  transmitted  together.  This  rep¬ 
resents  an  active  packet;  it  contains  part  code  and 
part  data.  The  RU  N  function  evaluates  the  function 
and  returnsthe  result.  The  result  in  thiscase  isstatic 
data,  a  legacy  data  packet.  Mathematica  code  that 
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analyzes  the  characteristics  of  algorithmic  and  pas¬ 
sive  information  transmission  is  shown  in  Figure  58. 

res  =  AnetSim  [  {  {  {#1  /  #2  &}  ,  {22.,  7.}}, 

{RUN [ { {#1 /  #2  &}  ,  {22.,  7.}}]}}), 

{(1,  2,  3,  4},  (4,  3,  2,  1)),  {100,  100,  1000,  10000)]; 

Figure  58.  Static  versus  active  information  in  the  M  athematics 
active  network  simulator. 


The  active  packet  is  defined  as  {{#1/  &},  (22. ,7.}}, 

which  contains  a  pair  of  values  and  the  code  neces- 
saryto  perform  division.  The  legacy,  or  passive 
packet,  isdefined  asRUN{{#l/  #2  &},  £2. ,7.}},  which 
pre-computes  the  result  of  the  division  and  transmits 
the  same  information  in  non-algorithmic  form.  The 
argument  defined  as{{L,2,3,4},-^,3,2,l}}identifiesthe 
links  traversed  by  the  active  and  passive  packets 
respectively.  In  this  case,  the  first  packet  begins  by 
crossing  link  one  and  the  second  packet  begins  by 
crossing  link  four.  The  argument  defined  as 
{LOO,  100, 1000, 1000} i  n d i cates  I  i  n  k  capaci ti es  fo r 
linksone,  two,  three  and  four.  Thus,  the  first  packet 
transmits  both  code  and  data  that  generates  the 
intended  information,  while  the  second  packet 
transmits  raw  data  only.  The  result  of  executing  the 
function  below  is  load  and  processing  time  spent  on 
each  link  and  node  for  each  packet.  In  Figure  59, 


Link 

Figure  59.  Algorithmic  versus  static  active  network  informa¬ 
tion  load. 


the  load  induced  by  sending  the  estimate  of  musing 
AnetSim  in  Figure  58  is  plotted  for  each  link. 
Clearly,  the  algorithmic  representation  of  the  infor¬ 
mation  is  more  compact  and  uses  less  link  capacity. 

In  fact,  this  rein  forces  the  fact  that  by  knowing 
how  to  compute  it,  one  could  build  a  more  compact 
representation.  ThisdemonstratesOccam's  Razor 


for  a  useful  purpose,  information  compression.  This 
has  facilitated  study  of  active  (algorithmic)  versus 
passive  transmission  of  information.  For  example, 
we  allow  the  ratio  of  data  to  code  to  change  for  the 
same  information  as  the  packet  traverses  the  net¬ 
work  in  a  manner  that  optimizes  both  link  capacity 
and  node  processor  speed. 

Complexity  surface:  the  Kolmogorov 
Complexity  map 

The  GE  Global  Research  active  network  test  bed 
implements  complexity  probes  as  part  of  the  active 
execution  environment.  The  choice  was  made  to 
embed  the  complexity  probe  in  the  execution  envi¬ 
ronment  rather  than  as  an  active  application 
because  it  is  necessary  to  examine  the  content  of 
active  packets  before  they  reach  the  execution  envi¬ 
ronment.  In  the  Mathematica  simulation,  each  com¬ 
ponent  of  the  active  application  contains  probe- 
input  points  through  which  bit  level  input  and  out¬ 
put  is  collected.  A  complexity  estimator  based  upon 
the  simple  inverse  compression  ratio  from  Equation 
(1.4)  is  used  to  estimate  complexity  in  the  density 
metric.  Figure  60  and  Figure  61  graphs  result  from 


density  estimates  taken  of  accumulated  input  and 
output  of  three  separate  components  of  the  active 
network  application.  The  graphs  show  the  complex¬ 
ity  of  bit-level  input  and  output  strings  concatenated 
together.  That  is,  every  input  sequence  is  concate¬ 
nated  with  an  output  sequence  and  the  density  of 
the  sequence  is  recorded  at  bit-level. 

The  input/  output  concatenation  is  generated 
either  for  individual  components  of  the  system  or 
for  a  composition  of  components.  If  there  is  low 
complexity  in  the  input/  output  observation  pairs, 
then  it  is  likely  to  beeasyforan  attacker  to  under¬ 
stand  the  system.  The  X-axis  isthe  number  of  input 
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Figure 61.  Mean componentcomplexities for B, C  and  E. 


and  output  obser  vati on s  concatenated  to  form  a  sin¬ 
gle  string  of  bits.  From  Figure  61,  it  would  appear 
that  Component  E  is  most  vulnerable  due  to  its  con¬ 
sistently  low  complexity  while  Component  B  appears 
to  be  the  least  vulnerable  due  to  its  larger  complex¬ 
ity.  These  results  make  intuitive  sense  because  Com¬ 
ponent  E  simply  forwards  data  without  any  form  of 
protection  while  Component  B  adds  noise  to  the 
data.  This  vulnerability  method  does  not  take  into 
account  whether  a  component  reduces  or  increases 
complexity.  I  n  other  words,  whether  the  change  was 
endothermic  or  exothermic  complexity.  These 
results  demonstrate  how  vulnerabilities  are  systemat¬ 
ically  discovered  using  complexity.  Vulnerabilities 
can  be  quantified  to  a  value  within  theboundsof  the 
complexity  measure  error. 

In  order  to  develop  the  Kolmogorov  Complexity 
Map  (K-Map),  consider  the  topology  in  more  detail. 
Figure  62  shows  the  resulting  densities  inserted  into 


gnp  =  Graph [ KMap ,  Range [Length [KMap] ] ] 

Graph  [  { {co,  1.17693,  oo,  1.00975},  {co,  co,  1.1074, 
{co,  oo,  oo,  oo}  ,  {oo,  oo,  1.1074,  oo}  }  ,  (1,  2,  3,  4}] 

Figure  62.  Kolmogorov  Complexity  map  (K-Map). 


a  Mathematica  graph  object.  The  graph  object 
allows  graph  theory  related  analyses  to  be  applied. 
The  directed  graph  Figure  63  shows  the  relationship 
among  the  vulnerabilities.  The  START  state,  located 
in  the  center  of  the  topology,  represents  a  location 
outside  the  system.  I  n  Figure  64  a  matrix  is  gener¬ 
ated  that  shows  the  cost,  in  terms  of  complexity,  of 
traveling  from  any  node  to  any  other  node  in  the  K- 
Map.  In  Figure  65,  the  function  CoordVul  computes  a 
maximum  flowthrough  the  K-Map  graph  using  the 


p3  =  ShowLabeledGraph[g,  {START,  B,  C,  E}]  ; 


Figure  63.  System  under  analysis:  components  and  topolo¬ 
gy- 


MatrixForm [AllPairsShortestPath [g]  ] 
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Figure  64.  M  inimum  complexity  paths  matrix. 


CoordVul  [g_,  x_,  y_]  :  =  Module  [  {s}  , 

Length  [g [2]]  ] 

Return  [  ^  NetworkFlow [g,  s, 

S  si 

xy2node[g,  x,  y]  PI]] 

] 

Figure  65.  Insecurity  flow  graph. 

node  positions  as  shown  Figure  63.  Density 
(K(x)/l(x))  acts  as  a  resistance,  while  its  inverse  acts 
as  conductance,  supporting  insecurity  flows  as  illus¬ 
trated  in  Figure  66.  The  resulting  flow  matrix  in 
Figure  68  shows  the  maximum  flowthrough  each 
link.  Figure  68showsthe  complexity  surface  of  the 
resulting  flows.  FI  igher  areas  correspond  to  less  vul¬ 
nerable  states,  while  lower  areas  correspond  to  more 
vulnerable  states.  Note  that  in  the  following  contour 
maps,  areas  of  infinite  height  are  simply  shown  with¬ 
out  a  surface.  By  comparing  Figure  63  and 
Figure  68,  it  is  apparent  that  the  START  state,  the 
infinite  mountain  in  the  center  of  the  topology,  is 
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Figure  66.  Grid-based  representation  of  information  assur¬ 
ance. 
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{X, 

-1.  , 

1-  ,  • 

3} 

,  (Y,  -1- 

,  1.  , 

•3}]] 

3. 65124 

3.  65124 

3.65124 

3. 

65124 

3.  65124 

3.  65124 

3.  65124 

3. 65124 

3.  65124 

3.65124 

3. 

65124 

3.  65124 

3.  65124 

3.  65124 

1. 04669 

1.  04669 

3.65124 

3. 

65124 

3.  65124 

3.  65124 

0. 896949 

1. 04669 

1.  04669 

0 

0 

0 

0.  896949 

0. 896949 

1. 04669 

1.  04669 

1.04669 

0 

0 

0. 896949 

0. 896949 

1. 04669 

1.  04669 

1.04669 

1. 

04669 

0. 896949 

0. 896949 

0. 896949 

1. 04669 

1.  04669 

1.04669 

1. 

04669 

0. 896949 

0.  896949 

0. 896949 

Figure  67.  Flow  results  matrix. 


Figure  68.  Complexity  surface  for  system  in  Figure  84. 

invulnerable,  which  makes  intuitive  sense.  State  E  is 
the  weakest  individual  component  and  lowest  area 
on  the  right  side.  Note  that  while  State  C  cannot  be 
directly  attacked  from  the  START  state,  it  can  be 


attacked  via  states  B  and  E,  located  in  the  upper  and 
lower  right  side  of  the  figure  respectively.  Thus,  B 
and  E  have  a  relatively  intermediate  level  of  vulnera¬ 
bility.  In  the  insecurity  flow  contour  shown  in 
Figure  69,  density  is  resistance  and  all  possible  flows 


from  and  to  every  node  are  summed  to  obtain  an  in 
security  level.  WhileNodeC  isassigned  infinite  com¬ 
plexity  as  shown  in  Figure  68,  it  actually  isthe  most 
insecure  component  given  that  flows  exist  from 
Nodes B  and  E. 

3.2  A  PRIORI  VULNERABILITY  ANALYSIS:THE 
NETWORK  INSECURITY  PATH  ANALYSIS 
TOOL 

The  Network  Insecurity  Path  AnalysisTool  (NIPAT) 
[105],  like  manysecurity tools,  assumed  apriori 
knowledge  of  vulnerabilities.  It  then  estimated  secu¬ 
rityflowbyassigning  probabilities  based  upon  the 
number  of  opportunities  for  an  attacker  to  advance 
from  one  vulnerability  to  another.  An  example  of 
NIPAT  operation  isshown  in  Figure  70.  In  thisfig- 
ure,  2,000  a  priori  defined  vulnerabilities  found  on  a 
few  nodes  of  a  network  that  were  thought  to  be  rea¬ 
sonably  secure  are  displayed.  The  hosts  upon  which 
vulnerabilities  reside  and  the  a  priori  defined  type  of 
vulnerability  are  displayed.  The  number  along  each 
edge  of  the  graph  represents  the  number  of  oppor¬ 
tunities  available  to  the  attacker  to  reach  the  next 
vulnerability.  This  information  is  gathered  from  net¬ 
work  security  software  agents  that  are  pre-pro¬ 
grammed  to  identify  predefined  types  of 
vulnerabilities.  The  security  vulnerability  graph  for  a 
typical  network  can  be  extremely  dense,  however, 
the  object-oriented  nature  of  the  security  model  is 
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Figure  70.  A  grid-based  tool  action. 


useful  in  choosing  the  level  of  abstraction  required. 
For  example,  it  may  be  possible  to  display  the  vulner¬ 
ability  graph  for  U  nix  hosts  in  general  and  to  hide 
the  details  of  individual  Unix  variants.  NIPAT  deter¬ 
mines  the  degree  to  which  specified  targets  within 
the  network  can  be  compromised.  The  vulnerability 
chain  isdisplayed  asa directed  graph.  Nodesrepre- 
sent  vulnerabilities  whose  security  may  be  compro¬ 
mised,  and  edges  represent  paths  from  vulnerability 
to  vulnerability.  The  larger  the  value  of  the  edge 
label,  the  greater  the  vulnerability.  The  focus  of  this 
effort  is  on  the  mathematical  representation  of 
information  assurance,  thus,  the  underlying  data¬ 
base  and  data  gathering  agents  are  not  discussed  in 
detail  in  this  paper. 

Two  algorithms  are  demonstrated;  the  first  is  a 
probabilistic  analysisand  the  second  isa  maximum 
flow  analysis.  Let  us  start  with  the  probabilistic  analy¬ 
sis.  Select  a  node  to  be  the  target  of  the  attack;  in 
this  case  we  have  selected  Host  C  Vuln  4.  Select  a 
specific  attack  entry  point  anywhere  in  the  system 
and  add  the  attacker  to  the  graph.  A  text  window 
appears  stating  the  probabilityof  successful  attack 
(0.729)  followed  by  the  graph  shown  in  Figure  71 
that  shows  the  most  probable  path  of  attack  high¬ 
lighted.  The  analysis  is  re-executed  using  the  maxi¬ 
mal  fl ow  algorithm.  Host  C  Vuln  4  is  again  selected 
as  the  target.  A  text  window  appears  displaying  the 
maximum  flow  (6.0)  as  well  as  detailed  graphical 
results  shown  in  Figure  72.  The  edge  values  have 
been  changed  to  show  the  maximum  flow  along 
each  edge  towardsthe  target  node.  In  this  case  there 
isa  flow  of  1.0  and  a  flow  of  5.0  that  can  reach  the 
target  node. 


Figure  71.  Mostlikely  attack  path. 


Figure  72.  Maximum  flow  paths. 


Complexity-based  insecurity  flow 

Assigning  probabilityof  exploitation  for  vulnerabili¬ 
ties  based  upon  the  assumption  that  all  vulnerabili¬ 
ties  can  be  explicitly  discovered  a  priori  and  placed  in 
data  or  knowledge  base  is  a  fallacy  for  several  rea¬ 
sons.  First,  a  brute  force  approach  that  attempts  to  a 
priori  explicitly  identify  all  possible  vulnerabilities  is 
highly  system  dependent  and  results  in  a  combinato¬ 
rial  explosion.  Second,  assigning  a  level  of  effort 
required  to  exploit  vulnerabilities  is  highly  subjec¬ 
tive.  Third,  failure  to  identify  even  a  single  vulnera¬ 
bility  can  result  in  catastrophic  performance  failure. 
Such  a  brute  force  technique  is  very  brittle  as  shown 
in  the  next  section.  Fourth,  once  such  inaccurately 
quantified  probabilities  have  been  assigned,  the 
probabilistic  mechanism  isan  unsuitable  technique. 
For  example,  the  simple  assumption  does  not  follow 
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that  composing  components  serially  results  in  a 
probability  of  successful  attack  quantified  by  the 
product  of  the  probabilities.  Mutual  information 
between  the  components  can  result  in  a  much 
higher  probability  of  successful  attack. 

Using  results  from  the  K-Map  described  in  the 
previous  section,  it  is  possible  to  address  these  prob¬ 
lems  by  retrofitting  NIPAT  to  use  the  complexity- 
based  vulnerability  framework.  Both  a  most  likely 
path  and  a  maximum  flow  algorithm  are  applied  in 
thisexperimental  complexity-based  vulnerability 
analysistool.  The  most  likely  path  is  determined  by 
finding  the  lowest  complexity  path  from  a  given 
attack  point  to  a  given  target  point.  The  maximum 
flow  algorithm  assumes  that  lower  complexity  paths 
have  a  greater  capacity. 

Thequestion  arises  as  to  what  yZowmeansin  terms 
of  complexity.  First,  the  entire  foundation  of  com¬ 
plexity-based  vulnerability  analysis  rests  upon  the 
likelihood,  or  probability,  of  attack  being  successful 
upon  thelowcomplexitylocationsof  an  information 
system  as  per  Definition  6.  The  complexity  probe 
values  are  displayed  as  links  in  the  complexity  tool 
display  shown  in  Figure  71.  The  values  of  the  links 
are  l(n)/K(n)  and  these  values  are  normalized  to  1.0 
for  each  node  in  order  to  obtain  a  probability  of  suc¬ 
cessful  attack  upon  each  link.  The  maximum  flow 
algorithm  provided  bythistool  indicates  not  only 
the  vulnerability  of  each  component,  but  also  the 
optimal  placement  of  resources  by  an  attacker  to 
maximize  the  likelihood  of  a  successful  attack. 

Safeguard  optimization  techniques 

We  assume  that  vulnerability  has  been  calculated  by 
NIPAT  to  be  either  the  maximum  insecurity  flow  or 
probability  of  successful  attack,  where  S  represents 
security  safeguards,  C(S)  isthe  cost  of  security,  and  L 
isthe  cost  constraint  or  some  other  hard  resource 
limit.  Next,  we  discuss  the  cost  in  termsof  impact  on 
users.  FI  ere  it  is  strictly  a  financial  cost  or  other 
resource  constraint.  Objective  Function  1  shows  how 
the  optimal  security  safeguard  allocations  can  be 
determined.  V(S)  is  l(S)/K(S).  Let  CS  represent  the 
network  service  to  customers,  with  a  minimum 
accepted  quality,  Q.  Let  V(S,A)  be  the  vulnerability  of 
the  network  to  a  particular  attacker,  A.  Then  Objec¬ 
tive  Function  2  shows  the  optimal  network  response 
given  the  current  state  of  the  attack. 

It  is  possible  to  use  NIPAT  to  study  various  strate¬ 
gies  of  both  the  defensive  and  offensive  players  in  a 
network  attack.  Once  an  attack  has  been  detected, 


Objective  Function  1:  min  V(S) 

Vulnerability  and  Cost  s.t.C(S)  <  L 

Obj  ective  Function  2 :  min  V(  S, A) 

Vulnerability  and  Cost  while  s.t.C(S)  >  Q,  C (S)  < 

Maintaining  QoS.  L 

the  network  command  and  control  center  can 
respond  to  the  attack  byrepositioning  security  safe¬ 
guards  and  by  modifying  services  used  by  the 
attacker.  H  owever,  cutting  off  services  to  the  attacker 
also  impacts  legitimate  network  users  A  careful  bal¬ 
ance  must  be  maintained  between  minimizing  the 
threat  from  the  attack  and  maximizing  service  to 
customers. 

The  distribution  of  insecurity  information — Another 
dimension  of  vulnerability  analysis  involves  detect¬ 
ing  vulnerabilities  that  change  over  time.  The  net¬ 
work  monitoring  tool  quantifies  the  vulnerability  of 
a  system  in  termsof  percent  of  patches  that  fail  to 
have  the  correct  signature,  percentof  files  which  are 
accessible  to  others  besides  the  owner  and  percent 
of  passwords  which  can  be  guessed  with  a  given  pass¬ 
word  generation  tool.  Clearly,  vulnerability  checks 
such  as  these  increase  the  security  of  the  network. 
Both  the  type  of  information  gathered  and  the  fre¬ 
quency  with  which  the  information  is  updated  quan¬ 
tify  the  effectiveness  of  a  network  monitoring 
strategy.  If  the  information  isnot  updated  frequently 
enough,  an  attacker  may  have  penetrated  network 
security  and  left  before  network  security  is  aware  of 
the  situation. 

A  n  esti  m  ate  of  th  e  ef  f  ecti  venessofthemonitoring 
system  is  based  on  a  profile  of  network  security 
attacks  on  the  Internet  and  the  following  parame¬ 
ters:  time  to  monitor  patches,  Trojan  horses,  pass¬ 
words,  and  anyother  vulnerabilities.  The  average 
attack  rate,  based  on  Internet  incident  reports  from 
an  anonymous  site  for  a  six-year  period,  is  five 
attacks  per  month.  Additionally,  the  Defense  Infor¬ 
mation  Systems  Agency  has  determined  by  experi¬ 
mental  means  [107]  that  only  0.7  percentof 
incidents  are  actually  reported.  Thus,  for  each  path 
in  the  network  security  vulnerability  chain,  the  cost 
to  the  attacker  isthe  probability  of  being  detected 
multiplied  by  the  cost  function  that  the  additional 
monitoring  provides. 

The  approach  to  measuring  the  complexity  of  a 
system  results  in  determining  the  ease  with  which  a 
potential  attacker  can  understand  the  system.  It  does 
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not  directly  account  for  the  fact  that  information 
about  the  target  system  can  be  obtained  by  a  poten¬ 
tial  attacker  in  algorithmic  form,  that  is,  in  the  form 
of  an  attack  tool.  Such  a  tool  does  not  require  the 
attacker  to  understand  itsoperation.  The  attack  tool 
is  like  an  active  packet,  or  a  parasite  that  depends 
upon  its  host  for  transportation.  This  is  distinct  from 
a  virus,  whose  primary  function  is  replication  and 
transport.  For  example,  an  attacker  may  have  little 
understanding  of  a  particular  system,  yet  the 
attacker  may  download  an  attack  tool  that  enables  a 
successful  attack.  Thus,  the  distribution  of  attack 
knowledge  needs  to  be  considered.  Once  an  attack 
tool  is  in  the  hands  of  an  attacker,  the  apparent  com¬ 
plexity  is  greatly  reduced.  There  is  an  interesting 
feedback  mechanism  here;  data  that  can  reduce  the 
apparent  complexity  to  a  potential  attacker  needsto 
be  kept  secure  bythe  defender.  0  nee  obtained  by  an 
attacker,  a  significant  drop  in  apparent  complexity 
occurs,  potentially  leading  to  further  significant 
reduction  in  apparent  complexity  as  more  vulnera¬ 
bility  information  is  obtained  and  disseminated  to 
other  attackers. 

One  might  view  the  evolution  of  complexity  in 
the  following  terms.  An  information  system  is  built. 
Initially,  an  attacker  discovers  its  least  complex  com¬ 
ponents.  The  attacker  decides  to  automate  his  attack 
(active)  and/  or  publish  the  mechanism  to  accom¬ 
plish  the  attack  (passive).  This  information  is  dis¬ 
seminated  through  the  population.  Meanwhile  the 
information  system  defenders,  usually  after  consid¬ 
erable  delay,  discover  the  attack  mechanism  and 
patch  the  hole.  The  population  of  attackers,  build¬ 
ing  upon  their  knowledge,  exploits  the  next  least 
complex  link  from  their  view  in  the  information. 
The  defenders  eventually  close  this  hole. 

The  cycle  continues  ad  infinitum.  The  cycle  of 
attack  and  defense  can  be  viewed  through  complex¬ 
ity  asa  cycle,  or  evolution  of  complexity.  Low  com¬ 
plexity  portions  of  a  system  will  eventually  be 
learned  and  disseminated  byan  attacker.  To  account 
for  this  dissemination  of  low  complexity  informa¬ 
tion,  defenders  reinforce  the  low  complexity  areas 
with  more  complexity.  The  results  of  this  project 
allow  system  developers  to  understand  not  only 
where  the  vulnerable  portions  of  the  system  are 
located,  but  to  engineer  their  systems  in  such  a  man¬ 
ner  as  to  control  the  cycle.  This  process  can  be  mod¬ 
eled  as  low  complexity  portions  of  an  information 
system  that  evolve  in  complexity  over  time. 


3.3  INTRODUCTION 

The  vulnerability  analysistechnique  presented  in 
this  paper  takes  into  account  the  innovation  of  an 
attacker  attempting  to  compromise  an  information 
system.  A  metric  for  innovation  is  not  new,  William 
of  Occam  suggested  a  technique  700  years  ago  [94], 
The  salient  point  of  Occam's  Razor  and  complexity- 
based  vulnerability  analysis  isthat  the  better  one 
understandsa  phenomenon,  the  more  concisely  the 
phenomenon  can  be  described.  This  is  the  essence 
of  the  goal  of  science:  develop  theories  that  require 
a  minimal  amount  of  information  to  be  fully 
described.  Ideally,  all  the  knowledge  required  to 
describe  a  phenomenon  can  be  algorithmically  con¬ 
tained  in  formulae,  and  formulae  that  are  larger 
than  necessary  lack  of  a  full  understanding  of  the 
phenomenon.  The  ability  of  an  attacker  to  under¬ 
stand,  and  thus  successfully  innovate  a  new  attack 
against  a  system  component,  is  directly  related  to  the 
size  of  the  minimal  description  of  that  component. 

Consider  an  information  system  attacker  asa  sci¬ 
entist  trying  to  learn  more  about  his  environment, 
that  is,  the  target  system.  Parasitic  computing  [95]  is 
a  literal  example  of  a  scientist  studying  the  opera¬ 
tion  of  a  communication  network  and  utilizing  its 
design  to  his  advantage  in  an  unintended  manner. 
The  attacker  as  scientist  generates  hypotheses  and 
theorems.  Theorems  are  the  attacker's  attempts  to 
increase  understanding  of  a  system  by  assigning  a 
cause  to  an  event,  rather  than  assuming  all  events 
are  randomly  generated.  If  theorem  x,  described  in 
bits,  is  of  length  l(x),  then  a  theorem  of  length  l(m), 
where  l(m)  is  much  less  than  l(x),  is  not  only  much 
more  compact,  but  also  2 times  more  likely  to 
be  the  actual  cause  than  pure  chance  [94],  Thus,  the 
more  compactly  a  theorem  can  be  stated,  the  more 
likely  the  attacker  isto  be  able  to  determine  the  true 
underlying  cause  described  bythe  theorem. 

M  otivation 

I  magine  a  vulnerability  identification  process  that 
consisted  of  the  following:  First,  wait  for  an  informa¬ 
tion  system  to  be  attacked.  Then  analyze  the  attack, 
assuming,  of  course,  the  system  survives  the  attack, 
can  still  be  trusted  and  the  attack  can  even  be 
detected.  Finally,  if  the  information  system  isstill  not 
compromised,  add  the  attack  information  to  one’s 
knowledge  base. 

Thistechnique  would  be  unacceptable  to  most 
people,  but  it  is  essential ly  the  vulnerability  analysis 
technique  used  today.  Information  assurance,  and 
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vulnerability  analysis  in  particular,  are  difficult  prob¬ 
lems  primarily  because  they  involve  the  application 
of  the  scientific  method  by  a  defender  to  determine 
a  means  of  evaluating  and  thwarting  the  scientific 
method  applied  by  an  attacker.  This  self-reference  of 
scientific  methods  would  seem  to  imply  a  non-halt¬ 
ing  cycle  of  hypothesis  and  experimental  validation 
being  applied  by  both  offensive  and  defensive  enti¬ 
ties,  each  affecting  the  operation  of  the  other.  I  nfor- 
mation  assurance  depends  upon  the  ability  to 
discover  the  relationships  governing  this  cycle  and 
then  quantifying  and  measuring  the  progress  made 
by  both  an  attacker  and  defender. 

A  metric  and  framework  are  required  for  quanti¬ 
fying  information  assurance  in  an  environment  of 
escalating  knowledge  and  innovation.  Progress  in 
vulnerability  analysis  and  information  assurance 
research  cannot  proceed  without  fundamental  met¬ 
rics.  The  metrics  should  identify  and  quantify  funda¬ 
mental  characteristics  of  information  in  order  to 
guarantee  assurance.  A  fundamental  definition  of 
vulnerability  analysis  isformulated  in  this  paper 
based  upon  attacker  and  defender  as  reasoning  enti¬ 
ties,  both  capable  of  innovation.  Truly  innovative 
implementations  of  attack  and  defense  lead  to  the 
evolution  of  complexity  in  an  information  system. 
Understanding  the  evolution  of  complexity  in  a  sys¬ 
tem  enablesa  better  understanding  of  whereto  mea¬ 
sure,  and  howto  quantify,  vulnerability.  In  turn,  this 
leads  towards  a  calculus  of  information  complexity. 
The  design  and  implementation  of  a  complexity- 
based  technique  is  presented  as  a  vulnerability  analy- 
sistool  for  automated  measurement  of  information 
assurance.  The  motivation  for  complexity-based  vul¬ 
nerability  analysis  comes  from  the  fact  that  complex¬ 
ity  isa  fundamental  property  of  information  and  can 
be  universally  applied. 

Components  of  the  analysis 

The  presentation  and  analysis  of  a  Kolmogorov 
Complexity-based  vulnerability  analysis  framework 
must  accomplish  several  goals.  As  initially  stated,  the 
vulnerability  analysistechnique  must  demonstrate 
the  ability  to  account  for  the  innovation  of  an 
attacker.  The  presentation  should  also  discuss  the 
relationship  to  previously  defined  properties  of  secu¬ 
rity.  The  technique  should  be  based  upon  funda¬ 
mental  propertiesof  information,  rather  than  suffer 
from  the  combinatorial  explosion  that  occurs  when 
explicitly  examining  all  possible  events  generated  by 
specific  systems.  The  vulnerability  results  should 


make  intuitive  sense;  vulnerability  is  reduced  by 
increasing  the  apparent  complexity  of  access  to 
information  from  potential  attackers  while  increas¬ 
ing  vulnerability  for  less  complicated,  or  in  some 
sense  shortest  paths  of  access  to  information.  In 
other  words,  low  complexity  implies  high  vulnerabil¬ 
ity  and  high  complexity  implies  low  vulnerability. 
The  results  should  not  only  be  intuitivelyclear,  but 
should  support  the  rigorous  definition  of  a  metric 
space. 

Once  this  has  been  shown,  a  topological  view  of 
vulnerability  can  be  demonstrated.  Thisisdemon- 
strated  by  meansof  a  KolmogorovComplexity  Map 
(K-Map)  in  which  low  complexity  paths,  which  are 
likelyto  be  easy  for  an  attacker  to  follow,  are  identi¬ 
fied.  The  concept  of  a  K-Map,  or  complexity  grid,  is 
shown  in  Figure  73  and  the  K-M  ap  for  a  specific 
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Figure  73.  Conceptual  view  of  a  vulnerability  and  attack  de¬ 
tection  complexity  grid. 


example  is  derived  later  in  this  paper. 

Figure  73  may  itself  appear  quite  complex  upon 
first  glance;  however,  focus  upon  individual  parts  of 
the  figure  in  a  logical  progression.  Begin  with  the 
information  to  be  protected,  which  lies  at  the  bot¬ 
tom  of  th  e  fi  gu  re.  Attacks  are  i  1 1  u  strated  as  th  e  th  i  n , 
downward-pointing  arrows  attempting  to  penetrate 
the  system  in  order  to  manipulate  the  information. 
Numerous  safeguards,  supposedly  designed  to  pro¬ 
tect  the  information  and  each  designed  to  mitigate 
particular  types  of  attack,  are  shown  as  barriers  with 
various  levels  of  porosity  (inserted  across  the  middle 
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of  the  figure) .  The  overall  complexity  of  the  system 
is  illustrated  by  the  surface  contour  located  above 
the  information  and  safeguards.  The  complexity  of 
the  system  asa  whole  iscomprised  of  the  complexity 
of  several  entities,  namely:  the  information  itself,  the 
complexity  of  the  system  in  which  the  information 
resides  and  the  complexity  of  the  safeguards.  Inno¬ 
vative  attacks  will  be  more  likelyto  successfully  pene¬ 
trate  those  areas  of  low  complexity  with  easier  to 
comprehend  components  of  the  system. 

In  addition,  specific  types  of  attacks,  such  as  Dis¬ 
tributed  Denial  of  Service  (DDoS)  will  appear  as 
warps  in  the  complexity  grid.  This  is  due  the  inher¬ 
ent  system  correlation  in  DDoS  attack-streams.  The 
vulnerability  analysistechnique  should  be  applicable 
in  a  highly  dynamic  and  amorphous  information 
environment.  An  active  network  environment  is  cho¬ 
sen  because  information  can  be  transmitted  through 
an  active  network  while  itsproportion  of  algorithmic 
content  varies.  I  n  other  words,  static  data,  execut¬ 
able  code  or  various  combinations  of  both  can  rep¬ 
resent  information.  In  addition,  both  forms  of 
information  should  have  high  assurance.  The  assur¬ 
ance  of  their  interaction  at  a  low  level  within  an 
active  network  presents  a  nice  challenge. 

An  example  application  of  vulnerability  analysis 
should  be  demonstrated  to  validate  the  feasibility  of 
the  framework.  This  paper  ends  by  demonstrating 
several  applicationsenabled  by  the  new  vulnerability 
analysis  framework.  The  first  application  of  vulnera¬ 
bility  analysisshows  that  the  complexity-based  vul¬ 
nerability  framework  enables  Brittle  Systems 
analysis.  Brittle  Systems  analysis  can  be  applied  to 
understand  the  trade-off  in  performance  versusfail- 
ure  of  security.  Finally,  another  application  shows 
that  complexity-based  vulnerability  analysis  enables 
the  optimization  of  security  safeguards. 

Properties  of  security 

There  have  been  many  attempts  to  define  security 
modelsthat  facilitate  the  proof  of  security 
properties  [  96] .  The  results  in  this  paper  focus  upon 
what  has  been  termed  probabilistic,  rather  than  pos- 
sibilistic,  security.  Possibilistic  security  is  concerned 
with  proofs  that  given  security  properties  can  never 
be  violated,  while  probabilistic  securityisconcerned 
with  estimating  the  likelihood  that  properties  will  be 
violated.  The  quantification  of  the  in  security  that 
results  from  the  successful  exploitation  of  areasof 
weak  security  is  referred  to  in  this  paper  as  vulnera¬ 
bility. 


The  security  framework  generally  assumes  that 
there  are  low-level  and  high-level  users  within  a  sys¬ 
tem.  The  intuitive  notion  isthat  high-level  users 
should  be  secure  from  low-level  users.  Security  prop¬ 
erties  include  non-inference:  low-level  users  should 
not  be  able  to  infer  information  about  high-level 
users,  non-interference :  high-level  users  are  prevented 
from  influencing  the  behavior  of  low-level  users 
(otherwise,  low-level  users  could  infer  information 
about  high-level  user  activity),  non-dedudble  output: 
low-level  users  cannot  distinguish  the  events  causing 
high-level  users'  output,  and  finally  separability:  no 
interaction  or  information  flow  is  allowed  between 
low  and  high  level  users.  Separability  is  too  strong  a 
security  property  because  it  does  not  allow  low-level 
users  to  interfere  with  high-level  users.  This  type  of 
interference  is  acceptable,  since  it  is  assumed  that 
information  flow  is  allowed  from  low-level  to  high- 
level  users.  The  perfect  security  property  allows  infor¬ 
mation  to  flow  only  from  low  to  high-level  users. 

While  in  theory  these  properties  are  useful  in 
attempting  to  prove  that  a  system  is  secure,  anec¬ 
dotal  evidence  suggests  that  fewdevelopers  will 
expend  the  effort  required  to  ensure  that  their  sys¬ 
tems  meet  these  properties.  The  number  of  events 
that  must  be  verified  for  possibilistic  security  results 
in  a  combinatorial  explosion.  In  contrast,  this  work 
attempts  to  develop  a  quantification  of  thedegreeto 
which  a  system  has  achieved  perfect  security  using 
fundamental  properties  of  information,  rather  than 
proving  perfect  security.  Security  properties  such  as 
non-inference,  non-interference,  non-deducible 
output,  and  separability,  define  various  mechanisms 
by  which  information  flow,  that  is,  information  that 
could  be  inferred  by  one  class  of  user  about  another 
class  of  user,  is  prevented. 

Similarlyto  previous  work  in  thisarea,  results  in 
this  work  are  based  upon  information  flow  gener¬ 
ated  by  a  low-level  user,  referred  to  as  an  attacker, 
inferring  information  about  higher-level  users.  It  is 
assumed  that  security  is  not  discrete,  but  varies 
throughout  a  system  and  that  attackers  will  want  to 
follow  paths  of  least  resistance  to  obtain  their  objec¬ 
tive.  That  is,  an  attacker  will  choose  paths  of  least 
resistance  with  the  possible  constraint  of  optimizing 
for  stealth  or  speed  of  attack.  Probabilistic  security 
has  been  explored  in  the  past,  however,  obtaining 
values  for  probabilities  of  insecurity  has  generally 
been  ill  defined.  This  paper  usesKolmogorovCom- 
plexity  [97]  as  an  underlying  meansto  estimate  inse¬ 
curity  probabilities. 
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3.4  VULNERABILITY  METRICS  WITH 
PHYSICAL  ANALOGS 

Vulnerability  is  generally  defined  as  the  probability 
of  a  successful  attack  multiplied  by  the  damage  done 
by  the  attack.  The  paper  focuseson  predicting  the 
probability  of  successful  attack  against  an  informa¬ 
tion  system  using  fundamental  propertiesof  infor¬ 
mation.  Information  properties  with  physical 
analogs  are  explored  because  (1)  they  are  likely  to 
yield  laws  of  information  that  are  fundamental  to 
information,  not  specific  to  individual  systems,  (2) 
the  properties  provide  deeper  insight  to  informa¬ 
tion  assurance,  and  (3)  they  can  be  universally 
applied.  Information  properties  that  have  physical 
analogs  and  that  are  candidates  for  fundamental 
parameters  upon  which  to  build  information  assur¬ 
ance  techniques  are  briefly  discussed  in  thissection. 

Volume 

In  his  ground  breaking  1949  paper,  Shannon  intro¬ 
duced  fundamental  trade-offs  and  limitationson  the 
ability  to  transmit  information  across  a  channel  dis¬ 
turbed  by  Additive  White  Gaussian  noise 
(AWGN)[98].  This  launched  the  science  of  informa¬ 
tion  theory  that  has  transformed  the  study  of  com- 
municationsand  coding  of  information.  It  also 
prompted  the  use  of  the  term  "bit"  of  information, 
which  Shannon  credits  to  J.W.  Tuckey,  into  the 
mainstream  literature. 

The  idea  that  information  can  bequantitized  into 
bits  (or  sequences  of  yes  or  no  answers  to  questions) 
is  now  well  accepted,  and  one  measure  of  the  size  of 
information  isthe  number  of  bits  used  to  convey  the 
information.  Information  compression  coding - 
both  loss-less  and  lossy-aswell  as  forward  error  cor¬ 
rection  coding,  alters  the  size  of  the  information  in 
terms  of  bits  by  removing  or  adding  redundancy. 

H  owever,  the  unit  of  size,  bits,  isthe  term  used  to  dis¬ 
cuss  the  size  of  information,  whether  it  is  efficiently 
coded,  error  prone  or  self-correcting.  Thus,  while  it 
is  possible  for  information  to  change  size  without 
altering  content,  size  is  a  fundamental  property  of 
information. 

Entropy 

Shannon  entropy  [98],  also  a  fundamental  property 
of  information,  measures  the  uncertainty  of  a  ran¬ 
dom  variable  X based  on  the  probabilitiesof  each 
outcome.  The  entropy  of  a  distribution  defines  the 
average  per  symbol  compression  bound  in  bits  per 
symbol  using  a  prefix  free  code.  Entropy  is  derived 


from  a  given  source  distribution  p  of  /symbols  as 
shown  in  Equation  (9).  KolmogorovComplexity,  to 
be  discussed  in  detail  later,  isestimated  from  an  indi¬ 
vidual  sequence  of  information.  These  two  parame¬ 
ters  are  extremely  powerful  properties  of 
information  that  occur  at  a  fundamental  level. 

; 

H(X)  =  -  ^  P^og2(Pi)  (  9) 

i=  1 

Density,  mass  and  energy 

Density  and  mass,  and  their  relation  to  energy,  are 
propertiesof  matter  that  have  parallel,  and  intu¬ 
itively  pleasing,  meanings  in  the  information 
domain.  Much  research  has  taken  place  on  the  mini¬ 
mal  energy  required  by  an  attacker  to  mount  a  suc¬ 
cessful  attack.  Density,  like  KolmogorovComplexity, 
may  measure  the  ability  of  a  sequence  to  be  com¬ 
pressed.  M  ass  may  simply  represent  the  number  of 
ones  in  a  sequence,  and  energy,  as  in  thermodynam¬ 
ics,  may  tie  together  quantities  such  as  mass,  density 
or  entropy.  The  goal  i s  to  fi n d  parameters  that  can 
be  observed  directly  from  information  sequences 
and  compare  objective  quantities  on  which  to  base 
the  science  of  information  assurance.  I  n  the  analyti¬ 
cal  framework  developed  in  this  paper,  Kolmogorov 
Complexity  is  analogous  to  mass  that  isused  to  for¬ 
mulate  a  density  metric. 

Complexity 

A  contribution  of  the  research  presented  in  this 
paper  isto  utilize  complexity,  KolmogorovComplex¬ 
ity  in  particular,  as  a  fundamental  property  of  infor¬ 
mation  for  vulnerability  analysis.  The  definition  of 
KolmogorovComplexity  rests  upon  the  notion  of  a 
Turing  Machine  program.  TheTuring  Machine  is 
one  of  the  most  fundamental,  general  purpose  com¬ 
puting  abstractions  and  is  well  known  in  computer 
science.  TheTuring  Machine  consists  of  a  seven- 
tuple(Q,  T,  1, 5,  b,  q0,  qjj.  Q  is  a  set  of  states,  Tisaset 
of  tape  symbols,  /is  a  set  of  input  symbols,  b  is  a 
blank,  q0 isthe  initial  state,  ^-isthe  final  state,  d  is 
the  next  move  function,  d  mapsa  subset  of  Qx  1*  to 
Qx  ( Tx{L,R,  SB*.  L,  R,  and  Vindicate  movement  of 
the  tape  to  the  left,  right,  or  remaining  stationary, 
respectively.  There  can  be  multiple  tapes.  Thusrf 
implementsa  "next  move”  function.  Given  a  current 
state  and  tape  symbol,  d  specifies  the  next  state,  the 
new  symbol  to  be  written  on  the  tape  and  the  direc¬ 
tion  to  move  the  tape.  One  approach  to  the  study  of 
security  isto  consider  theTuring  Machine  program 
as  a  representation  of  normal  system  operation.  In 
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such  an  approach,  if  the  T  uring  Machine  program 
recognizes,  or  accepts,  an  input  string,  then  a  user 
has  gained  access  to  the  system.  If  the  Turing 
Machine  program  accepts  a  string  that  we  did  not 
anticipate  (Si),  then  the  system  is  vulnerable,  as 
stated  in  Definition  1.  Clearly,  the  Turing  Machine 
program  is  an  abstract  representation  of  any  proto¬ 
col  implementation,  or  operating  component  opera¬ 
tion.  The  set  of  unanticipated  input  strings  that  is 
accepted  isthe  vulnerability  of  the  component  (V) 
as  shown  in  Lemma  1. 


Definition  1: 

System 

Vulnerability 


Lemma  1: 

Secure 

Component 


If  a  Turing  Machine  program 
recognizes,  or  accepts,  a  string,  then 
the  user  entering  that  string  is  defined 
to  have  gained  access  to  the  system.  If 
the  Turing  Machine  program  accepts  a 
string  that  was  not  anticipated  in  the 
initial  design  of  a  system  (Sf,  then  the 
system  is  vulnerable. 

Given  V=  a  component  is  secure  if 
and  only  ifV=0. 


Throughout  this  paper  the  assumption  is  made 
that  an  attacker  has  the  objective  of  exploiting  any 
vulnerability  that  requires  the  attacker  to  under¬ 
stand  enough  of  the  component  to  design  a  system 
attack.  The  attacker  must  determine  the  function 
performed  bytheTuring  Machine  program.  In  this 
case,  the  attacker  is  assumed  to  have  the  ability  to 
observe  every  member  of  the  input  language  in 
order  to  deduce  operation  by  viewing  the  output. 
The  attacker  is  actually  inferring  d.  Asd  isinferred, 
more  opportunities  for  attack  may  present  them¬ 
selves. 

Turing  M  achines  and  Kolmogorov  Complexity 

Information  must  be  accessible  to  legitimate  users 
while  access  is  denied  to  potential  attackers.  This  is 
accomplished  by  increasing  the  apparent  complexity 
of  access  to  information  while  providing  legitimate 
users  with  enough  a  priori  knowledge  to  reduce  the 
apparent  complexity.  This  leads  one  to  conclude 
thatcomplexityitself  isa useful  metric.  However,  the 
search  for  an  absolute  measure  of  complexity  is  a 
problem  that  may  be  equally  as  difficult  as  quantify¬ 
ing  information  assurance.  There  is  a  good  reason 
for  this;  they  are,  in  a  sense,  one  and  the  same.  The 
results  in  this  paper  demonstrate  how  complexity 
can  be  estimated  for  use  as  a  system-wide  vulnerabil¬ 
ity  metric. 


KolmogorovComplexity  isa  measure  of  descrip¬ 
tive  complexitythat  refers  to  the  minimum  length  of 
a  program  such  that  a  universal  computer  can  gen¬ 
erate  a  specific  sequence.  KolmogorovComplexity  is 
described  in  Equation  10,  where  j  represents  a  uni¬ 
versal  computer,  p  rep  resents  a  program,  and  xrep- 
resentsa  string.  Universal  computers  can  be  equated 
through  programs  of  constant  length;  thus  a  map¬ 
ping  can  be  made  between  universal  computers  of 
different  types.  The  string  xmay  be  either  data  or 
the  description  of  a  process  in  an  actual  system. 
Unless  otherwise  specified,  consider  xto  be  the  pro¬ 
gram  for  a  Turing  Machine  described  in 
Definition  1. 


Vx>  = 


[  min  l(p 

1<p(p)  =  * 


(10) 


Ky{x\y)  = 


min  l(p) 

(p  (p,x)  =  y 

oo,  if  there  is  no  p  such  that  cp(/?,  x)  =  y 


(ii) 


Conditional  Complexity,  in  Equation  11,  quanti¬ 
fies  the  complexity  of  string  x,  given  string  3).  Intu¬ 
itively,  it  isthe  additional  complexity  of  string  x 
beyond  that  in  string  >  Conditional  Complexity  is 
used  in  developing  the  K-Map.  A  fundamental  met¬ 
ric,  based  upon  KolmogorovComplexity,  used 
throughout  the  remainder  of  this  paper  is  density. 
Density  and  its  inverse,  dispersion,  are  shown  in  Def¬ 
inition  2.  If  x  represents  a  program,  then  dispersion 
can  be  considered  inefficiency  in  implementation  in 
terms  of  size.  A  disperse  implementation  of  a  system 
hasmoretransitionsand  states  than  necessary.  Thus, 
there  is  greater  opportunity  for  an  attacker  to  find  a 
weak  point  in  the  system.  H  owever,  once  an  attacker 
breaks  into  a  disperse  system,  there  will  be,  on  aver¬ 
age,  more  energy,  that  is,  longer  string  length, 
required  to  reach  the  attacker's  target.  Greater  dis¬ 
persion  should  implyreduced  brittleness  (Defini¬ 
tion  8)  of  resistance  to  attack. 


Definition  2:  The  density  ofx  is  K(x)/l(x),  where  l(x) 

Density  is  the  length  of  x.  Dispersion  is  the 

inverse  of  density. 


Complexity  as  a  Vulnerability  M  etric 

Information  assurance  is  increased  by  in  creasing  the 
apparent  complexity  of  access  to  information  from 
potential  attackers  while  providing  legitimate  users 
the  least  complex,  or  in  some  sense  the  shortest 
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path,  to  accessof  infor  mation.  Figure  74conceptu- 


M  s  =  Secure  Operations 

Figure  74.  Set  Theory  View  of  Secure  Operation. 

ally  illustrates  an  instance  of  secure  and  insecure 
operation  in  a  system.  Secure  and  insecure  opera¬ 
tion  exist  to  varying  degrees  in  the  space  of  all  possi¬ 
ble  forms  of  operation,  M.  Insecure  operation,  Mj, 
consists  of  those  methods  of  operation  that  allow  an 
information  warfare  aggressor  entrance  into,  or 
access  to  control  points,  of  the  information  system. 
The  intended  secure  operation  areasM^are  well 
known  and  some  of  the  insecure  paths  are  also 
known.  Note  that  M^and  Mj  can,  and  usually  do, 
overlap.  H  owever,  the  entire  area  of  operation  can 
be  extremely  large  and  an  exhaustive  search  for  all 
insecure  operation  is  not  feasible. 

In  Figure  74,  Euclidean  distance  corresponds  to 
the  degree  of  security.  This  leads  one  to  consider  a 
metric  space  upon  which  to  base  information  assur¬ 
ance.  The  initial  approach  assumesonlythatthe 
metric  has  the  characteristics  of  a  metric  in  the 
mathematical  sense  as  shown  in  Definition  3,  where 
d  is  distance  and  />and  q  are  points. 

Definition  3:  (A)  d  (p,q)  >  0  ifp  *  q;  d  (p,p)  =  0 

Properties  of  an  (B)  d  (p,q)  =  d  (p,q) 

Information  (C)  d  ( p,q )  <  d  (p,  r)  +  d  ( r,q)for  any  r 

Assurance  EX 

Metric 

As  illustrated  in  the  left  side  of  Figure  74,  an 
information  system  de-composed  into  many  operat¬ 
ing  components  could  have  a  surface  area  as  shown 
on  the  right  side  of  Figure  74.  Note  that  this  surface 
islikelyto  change  asafunction  of  time,  however,  the 
time  indices  are  not  written  for  now.  The  points,  p 
and  q,  are  assumed  to  be  relative  to  some  absolute 
value;  />and  yean  be  security  values  in  either  differ¬ 
ent  locations  or  at  different  time  instances  of  the  sys¬ 
tem.  If  4isa  measure  of  security,  then  Definition  3.A 
implies  that  there  is  no  difference  in  security 


between  the  same  point  and  itself.  H  owever,  there 
must  be  a  difference  between  any  two  distinct  points 
in  the  security  space.  Definition  3.B  states  that  the 
measure  between  anytwo  pointsin  thisspace should 
be  thesame  regardlessof  the  order  in  which  one 
takes  the  measurement.  This  meansthat,  observed 
from  a  common  viewpoint,  if  security  is  measured  at 
two  different  pointsin  thisspace,  p and  q,  then  the 
measure  of  security  will  be  the  same  regardless  of 
the  order  in  which  the  points  are  entered  in  the 
measure. 

It  does  not  imply  anything  about  the  strength  of 
an  attack  from  p to  q or  an  attack  from  q to  p.  It 
means,  for  example,  that  ifp  is  less  than  q,  then  an 
attack  from  outside  the  system  against  will  be  more 
likely  to  succeed  than  an  attack  against  q.  Finally, 
Definition  3.C  states  that  the  distance  between  any 
two  points  will  be  less  than  or  equal  to  the  sum  of 
the  distances  between  each  of  those  points  and  a 
common  third  point.  Again,  remember  that  this  isa 
measure  of  security  taken  from  a  view  outside  the 
system  of  a  potential  attack  from  outside  the  system. 

As  discussed  in  more  detail  in  the  remainder  of 
this  report,  the  actual  measure  will  change  as  an 
attacker  penetrates  the  system  and  gains  more 
knowledge  of  the  system.  KolmogorovComplexity 
has  been  shown  to  possess  the  characteristics  of  a 
metric  space  [99]  and  in  Section  III  an  implementa¬ 
tion  isdeveloped  that  generates  a  topology  similar  to 
Figure  74  for  a  given  system. 

If  information  assurance  can  be  proven  to  reside 
in  a  metric  space,  or  alternatively,  if  a  metric  space 
can  be  chosen  in  which  information  assurance  can 
reside,  then  principles  of  mathematical  analysis 
[100]  can  be  used  to  rigorously  determine  more 
detailed  characteristics.  For  example,  Mean  be 
extremely  large,  possibly  infinite.  AreMs,  or  con¬ 
versely,  Mj,  open  sets?  If  so,  can  limit  points  be 
defined?  What  does  an  open  set  mean  with  regards 
to  information  assurance  and  security? 

Asa  simple  example,  consider  a  password  protec¬ 
tion  system.  Each  character  that  a  legitimate  user  of 
the  system  adds  to  a  password  increases  the  number 
of  possibilities  that  a  brute  force  (non-dictionary) 
attack  would  require  in  order  to  guess  the  password. 
Thus,  the  longer  the  passwords  or  encryption  keys, 
the  more  secure  the  system.  While  an  infinite  length 
password  is  not  possible,  security  does  begin  to 
approach  a  limit  point. 

Thiscan  also  be  seen  in  any  security  safeguard 
that  works  via  the  increase  of  complexity.  That  is, 
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adding  more  non-redundant  states  to  aT  uring 
Machine  program,  given  the  definition  of  perfor¬ 
mance  in  Definition  7,  to  increase  security.  This 
approach  towards  safeguard  design  approaches  a 
limit  point  but  can  never  reach  perfect  security. 
Security  performance  becomes  less  dense  and  less 
brittle.  However,  in  general,  disappears  to  be  the 
only  known  approach,  and  thus  limit  points  must 
exist. 

Topological  Space  for  Information  Assurance 

By  definition,  an  open  set  (E)  is  one  in  which  every 
pointisan  interior  point.  A  point, /*,  isan  interior 
point  of  E  if  there  is  a  neighborhood,  N,  of  p  such 
that  nce.A  neighborhood  Nr(p)  of  point  p  consists 
of  all  points^  such  that  N  (p,q)  <  r  where  fiscal  led 
the  radius  of  the  neighborhood.  If  security,  asdeter- 
mined  by  a  given  metric,  isan  open  set,  then  there 
are  significant  implications  because  of  this.  The  best 
that  can  be  hoped  for  in  such  a  case  isto  determine 
limit  points,  because  a  distinct  boundary  between 
security  and  insecurity  would  not  exist.  Will  it  be  the 
case  that  adding  layers  of  security  is  much  like  add¬ 
ing  "open  covers",  that  is,  the  result  can  never  be 
perfect  security,  but  rather  an  approach  to  a  limit 
point?  The  complement  of  an  open  set  is  closed; 
what  does  that  imply  for  assessment  of  insecurity? 
Vulnerability analysistoolshave  been  developed  that 
assume  all  vulnerabilities  have  been  identified  and 
measured,  and  that  the  vulnerabilities  can  be  manip¬ 
ulated  as  discrete,  closed  sets. 

In  order  to  determine  whether  such  measure¬ 
ments  can  be  applied  to  information  assurance,  con¬ 
sider  topology,  metric  spaces,  and  the  fundamentals 
of  measurement  theory  in  more  detail.  The  defini¬ 
tion  below  shows  howthe  topology  is  induced  bya 
metric  d.  In  Definition  4,  x  is  a  collection  of  subsets 
of  Xsuch  that0ex  and  let,  any  finite  intersec¬ 
tion  of  membersofx  is  in,  and  any  union  of  mem¬ 
bers  of  x  is  in  x.  Definition  5  illustrates  the  topology 
that  will  be  induced.  This  is  particularly  important  in 
the  development  of  the  K-Map  discussed  in  detail  in 
Section  III.  In  the vulnerabilityframework  presented 
in  this  paper,  the  metric  (d)  is  density  (K/  L)  in  Defi¬ 
nition  6. 

The  intuitive  notion  isthat  d  represents  the  ease 
of  movement  of  an  intruder  from  one  vulnerability 
to  another  vulnerability,  where  d  (x,y):XxX^R.  A 
simple  metric,  asdiscussed  previously,  isto  define  d 
as  the  number  of  state/  transition  sequences,  within 
a  Turing  Machine  program  representation  of  a  sys- 


Table  6 


Definition  4: 
Metric  Space 


Definition  5: 

Induced 

Topology 


Let  dbe  a  metric  on  X.  A  metric  space 
(X,  d)  is  a  topological  space  where  the 
topology  U  is  the  smallest  one  that 
contains  all  sets  of  the  form  fy:  d(x, 
y )  <A }  for  all  x  and  A. 

V  is  the  set  of  vulnerabilities  and  (V,d) 
is  the  topology  induced  by  the  choice  of 
information  assurance  metric. 


tern,  which  an  intruder  can  follow  to  move  from  vul¬ 
nerability  xto  vulnerability  y  or  equivalently  the 
cardinality  of  the  set  of  Vfrom  Lemma  3.  Does  infor¬ 
mation  assurance  reside  within  this  metric  space? 
One  test  would  be  whether  the  metric  supports  the 
design  tradeoffs  required  in  determining  brittleness 
in  the  design  of  the  system. 

To  answer  the  above  question,  let  Vbe  the  set  of 
currently  exploited  vulnerabilities.  Most  informa¬ 
tion  security  approaches,  including  the  one  above, 
assume  that  all  vulnerabilities  have  been  discovered 
and  measured.  This  can  never  be  assumed  to  be  the 
case.  Performance  (a)  from  Definition  7  isan  open 
set,  and  as  new  security  holes  are  discovered, 

If  ^represents  vulnerability  and  isopen, 
then  secure  operation,  v,  is  closed.  Assume 
thatsup  -d{x,x0)<™  for  any  x0.  Note  that  x  is  now 
an  element  of  the  set  of  secure  operations.  I  n  other 
words,  the  number  of  secure  operations  is  bounded. 
It  is  well  known  that  a  set  iscompact  if  and  only  if  it 
isclosed  and  bounded. 

Building  upon  Definition  4  and  Definition  5 
requires thatTuring  Machine  program  states(Q)  be 
identified  as  either  secure  or  insecure.  If  an  attacker 
can  reach  a  member  of  qinsecure then  the  attacker  is 
considered  to  have  performed  a  successful  attack.  If 
an  attacker  can  never  reach  a  member  of  qinsecure 
then  the  system  isconsidered  invulnerable.  The 
challenge  isthat  neither  the  attacker  nor  the 
defender  knows  the  entire  structure  of  the  Turing 
Machine  program,  first  because  the  attacker  is 
unlikely  to  have  complete  knowledge  of  the 
defender's  system  and  also  because  even  the 
defender  may  not  fully  understand  the  system  that 
was  developed.  However,  complexity  estimation  can 
be  applied  without  requiring  a  detailed  understand¬ 
ing  of  the  target  system.  The  following  section  pre¬ 
sen  ts  resu  I  ts  o  n  th  e  f easi bi I ity  of  the  complexity- 
based  vulnerability  analysistechnique  by  applying  it 
to  an  active  network. 
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3.5  BRITTLE  SYSTEMS,  DETERMINISTIC 
FINITE  AUTOMATA,  AND 
VULNERABILITIES 

In  order  to  understand  the  requirements  for  a  vul¬ 
nerability  analysis  metric;  consider  the  manner  in 
which  systems  that  implement  information  assur¬ 
ance  can  be  designed  using  such  quantification. 
Design  involves  the  tradeoff  of  one  benefit  for 
another.  Brittle  SystemsTheory  provides  a  frame¬ 
work  for  understanding  the  tradeoffs  in  perfor¬ 
mance  versus  failure  of  information  systems. 

Brittle  systems  analysis  [  108]  is  based  on  the  idea 
that  systems  can  fail  in  a  manner  analogousto  brittle 
fracture  of  materials.  A  system  can  maintain  very 
high  performance  until  it  fails  quickly  and  cata¬ 
strophically,  as  illustrated  by  performance  curve  Ph 
in  Figure  75,  or  systems  may  fail  by  exhibiting  lower 

Performance  Performance 


Figure  75.  Definition  of  brittleness:  brittle  versus  ductile  per¬ 
formance. 


performance  in  a  gradual,  more  ductile  manner  as 
in  curve  PI.  The  mapping  between  Brittle  Systems 
Theory  and  information  assurance  is  shown  in  Table 
1.  This  analysis  can  be  directly  applied  to  the  Turing 
Machine  program  representation  of  a  system. 
Changes  in  any  of  the  state  machine  parameters,  Q, 
T,l,  8,b,  q0,  qj,  may  modify  the  brittleness  of  the  sys¬ 
tem.  For  example,  addition  of  a  new  state  and  transi¬ 
tion  could  cause  the  system  to  behave  in  a  more 
ductile  or  brittle  manner.  What  isthe  measure  of 
performance  in  aTuring  Machine  program  model 
of  information  assurance?  What  does  catastrophic 
failure  mean  in  aTuring  Machine  program  model  of 
information  assurance? 

I  n  order  to  answer  these  questions,  the  perfor¬ 
mance  a  from  Definition  7  is  resistance  to  attack  and 
Xin  Figure  75  is  effort  required  bythe  attacker.  The 
answers  to  the  above  questions  are  intimately  linked 
to  the  choice  of  metric.  Based  on  Definition  1,  one 
could  choose  the  metric  to  be  the  number  of 
state/ transition  paths  available  to  an  attacker  to 
reach  a  particular  target  state,  or  equivalently,  the 
cardinality  of  the  set  of  strings  (V)  given  in 


Lemma  1.  Another  possible  metric  could  be  the 
proximity  of  the  attacker's  current  state  to  the  state 
that  isthe  target  of  an  attack.  Note  that  later  it  is 
shown  that  K/L  is  a  vulnerability  measure  and  is 
related  to  I VI.  Next,  more  detail  on  Brittle  System 
analysis  and  how  it  relates  to  the  complexity-based 
vulnerability  metric  isdiscussed  using  Finite  Autom¬ 
ata. 


Definition  7: 
Information 
Assurance 
Performance. 


Information  assurance  performance  is 
the  inverse  of  the  vulnerability  induced 
by  the  choice  of  metric,  a  =  1/\V\.  A 
nearly  invulnerable  system  has  nearly 
infinite  performance  and  an  extremely 
vulnerable  system  has  nearly  zero 
performance. 


A  Deterministic  Finite  Automaton  (DFA)  consists 
of  a  5-tuple  ( S ,  l ,  a,  sO,  F)  where  S  is  the  set  of  states, 
/isthe  input  alphabet,  a  isa  mapping  from  Sinto  /, 
sO  is  the  start  state,  and  F\s  a  subset  of  ^  cal  led  the 
final,  or  accepting  states.  A  DFA  is  less  powerful  than 
aTuring  Machine  program.  However,  DFA  have 
been  well  studied  and  facilitate  a  framework  in 
which  new  theories  related  to  information  assurance 
can  be  studied.  An  example  of  Brittle  Systems  using 
Definition  1  for  vulnerability  is  illustrated  for  the 
DFA  shown  in  Figure  76.  A  single  vulnerability  isrep- 


Figure  76.  Example  deterministic  finite  automaton  of  a  sys¬ 
tem  undergoing  brittle  analysis  with  the  complexity-based 
vulnerability  metric. 
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resented  as  a  single  modified  transition,  i/aully.  The 
modified  transition  represents  an  error  in  either  the 
design  or  implementation  that  may  allow  an  attacker 
to  penetrate  the  system.  The  effect  of  each  transition 
modified  from  its  original  source  node  to  each  possi¬ 
ble  destination  node  in  the  automaton  isexhaus- 
tively  checked.  The  effort  expended  by  an  attacker  is 
assumed  to  be  proportional  to  the  length  of  the 
strings  used  in  the  attack.  In  the  application  of  brit¬ 
tleness  to  vulnerability  analysis,  performance  Pis 
defined  by  a  (Definition  7);  Xisdefined  by  Als, 
which  is  the  effort  of  an  attacker  measured  in  terms 
of  string  size  required  to  reach  an  unintended 
accepting  state.  The  algorithm  requires  starting  with 
the  actual  system  as  represented  in  Figure  77,  modi- 


Accepted  Strings  vs.  Language  Size 


String  Size  (Als) 

Figure  77.  Ductile  resistance  to  attack  for  system  in  Figure  79 
with  fault  (7, 3, 2). 


fying  a  transition  and  then  recording  the  number  of 
additional  strings  accepted .  T  h  i  s  i  s  repeated  for  each 
transition  in  the  base  system.  As  shown  in  Figure  78, 
a  modification  of  the  transition  from  State  7, 

Input  3,  Destination  State 2,  (7,3,2)  yieldsa small 
number  of  vulnerabilities  at  string  length  two  with  a 
maximum  of  1000  vulnerabilities  at  string  length  8. 
This  performance  is  ductile  compared  to  the  graph 
shown  in  Figure  78,  where  transition  (1,3,1)  is  modi¬ 
fied.  Figure  78showsmore  brittle  behavior  because  it 
takes  a  longer  string  length,  thus  more  effort  by  the 
attacker  to  find  vulnerabilities;  however,  the  vulnera¬ 
bility  increases  rapidly  as  the  string  length  increases. 
A  more  precise  definition  of  Brittleness  is  given  in 
Definition  8.  Brittleness,  defined  bythe  area  given  in 
the  definition,  isin  units  of  Als/\V\. 

Definition  9  providesa  meansfor  easily  comput¬ 
ing  complexity  in  the  world  of  finite  automata.  Next 
the  relationship  between  brittleness  and  complexity 


Accepted  Strings  vs.  Language  Size 


String  Size  (A,s) 

Figure  78.  Brittle  resistance  to  attack  for  system  in  Figure  79 
with  fault  (1, 3, 1). 

is  addressed.  One  might  intuit  that  a  faulty  transition 
in  a  less  complex  automaton  will  have  lessof  an 
impact  than  a  faulty  transition  in  a  complex  version 
of  the  equivalent  automaton.  The  definition  of 
equivalent  automata  is  given  in  Definition  10. 


Definition  8:  The  brittleness  of  a  system  is  a  relative 

Brittleness.  measure  based  upon  the  size  of  the 

area  defined  by  T(B  -  A) dT  from 
Figure  79,  where  A  and  B  are 
normalized  to  have  the  same  area  and 
T  is  a  tolerance  range.  In  the  operation 
below,  T  is  defined  as  the  width  of  the 
line  formed  by  the  intersection  of  A  and 
B 


Definition  9:  he  complexity  of  a  DFA  or  ND  FA  is  the 

Complexity  of  number  of  transitions  in  the  smallest 
NDFA.  DFA  that  accepts  the  original  language 

of  the  DFA. 


Definition  10:  Automatons  A  and  B  are  equivalent  if 

Equivalence  of  and  only  if  A  and  B  accept  the  same 

Automata  A  and  language. 

B. 


Definition  11: 
Correlation 
between 
Brittleness  and 
Complexity. 


There  is  a  correlation  between 
brittleness  and  density.  A  dense  system, 
being  more  highly  optimized,  will  fail 
at  a  faster  rate  than  a  simple  version  of 
the  same  system.  On  the  other  hand,  a 
simple  system  will,  on  average,  have 
more  opportunity  for  error,  while  those 
errors  are  less  catastrophic. 


Figure  79  and  Figure  80  show  a  simple  and  com¬ 
plex  implementation,  respectively,  of  the  same  infor¬ 
mation  system.  Figure  79,  as  a  simple 
implementation,  is  what  might  be  intuitively 
referred  to  as  an  inefficient  implementation, 
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becauseitcontainsmanymorestatesthan  necessar  y. 
This  yields  the  opportunity  for  more  vulnerabilities 
and  faults.  H  owever,  it  also  requires  more  effort  by 


Figure  80.  Implementation  of  the  DFA  in  w  Figure  84that  ap¬ 
proaches  true  complexity. 

the  attacker,  that  is,  a  larger  to  reach  a  given  tar¬ 
get. 


Table  7  Brittle  vulnerability  analysis  definitions  . 


Materials  Science 

Brittle  Systems 

Information  Assurance 

Stress 

Amount  parameter  exceeds  its  toler¬ 
ance 

Applied  force  under  the  weight  of  an  attack,  A|$T 

Toughness 

System  Robustness 

Encryption  strength  and  sensitivity  of  intrusion  detectors 

Ductility 

Level  of  Performance  outside  Toler¬ 
ance 

Ability  of  system  to  gracefully  degrade  given  an  attack, 

Ajs  /IVI:  A|S  >  A|St 

Plastic  Strain 

Degradation  from  which  the  system 
cannot  recover 

Trojan  horse 

Brittle  Fracture 

Youngs  Modulus 

Sudden  steep  decline  in  perfor¬ 
mance 

Amount  tolerance  exceeded  over 
degradation 

Sudden  catastrophic  collapse  of  all  information  assur¬ 
ance 

Deformation 

Degradation  in  performance 

The  amount  by  which  vulnerability  has  been  increased 
due  to  an  attack,  ~  (1/  V  ) 

Brittleness 

Ratio  of  hardness  to  ductility;  esti¬ 
mated  as  difference  in  performance 
curves  when  outside  tolerance 

(A|sh/IVI)-(A,Sd/IVI):  (AISh  <  A|SJ  and  A,Sd  >  Alsj) 

Ductile  Fracture 

Graceful  degradation  in  perfor¬ 
mance 

Ability  of  information  to  gracefully  degrade  under  an 
attack 

Reversible  Strain 

Degradation  from  which  the  system 
can  recover 

Trojan  horse  detection  and  removal 
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3.5  Brittle  Systems,  Deterministic  Finite  Automata,  and  Vulnerabilities 


Table  7  Brittle  vulnerability  analysis  definitions  (Continued). 

Materials  Science  Brittle  Systems  Information  Assurance 

Hardness  Level  of  Performance  within  toler-  Resistance  to  decryption,  Ais /IVI:  Ais  <  A!st 

ance  limits 


Figure  80  is  a  closer  representation  of  the  tr  ue 
complexity  of  the  same  system.  It  has  fewer  opportu¬ 
nities  for  fai  I  u  re;  h  o  wever,  th  e  f  ai  I  u  res  th  at  occu  r  wi  1 1 
have  a  more  significant  impact.  I  n  Figure  81  and 

Brittleness  vs.  Fault  Location 


Faulty  Transition  (tfaulty) 

Figure  81.  Brittle  measure  of  DFA  shown  in  Figure  78 in  dimen¬ 
sions  of  AJ\  l/|  versus  tfaulty. 

Figure  82,  brittleness  and  complexity  are  compared. 


increases.  A  brittlesystem  accepts  few  words  initially, 
and  then  suddenly  accepts  a  large  number,  while  a 
ductile  system  accepts  a  moderate,  but  gradually 
increasing  number  with  no  sudden  increase.  The 
brittle  measure  is  graphed  as  a  function  of  a  fault  in 
the  state  specified  on  the  dependent  axis.  A  fault  is 
generated  by  the  re-connection  of  a  single  specified 
transition  to  a  destination  other  than  that  which  was 
originally  specified.  A  singlefault  leadsto  many  n-1 
possible  faulty  states  where  n  isthe  number  of  origi¬ 
nal  states.  Complexity  is  estimated  as  the  number  of 
transitions  in  the  smallest  representation  of  the 
resulting  faulty  system's  DFA.  Comparing  Figure  81 
and  Figure  82,  there  appears  to  be  an  opposite  rela¬ 
tionship  between  brittleness  and  complexity.  That  is, 
a  system  with  greater  complexity  results  in  lower  brit¬ 
tleness.  Greater  complexity  indicates  a  larger  num¬ 
ber  of  transitions  and  states  exist,  thusthere  is  more 
opportunityfor  an  attack,  but  more  effort  is 
required  by  the  attacker  to  successfully  complete  the 
attack.  In  Figure  83  and  Figure  84  a  similar  analysis 


Complexity  vs.  Fault  Location 


Faulty  Transition  (tfaultv) 

Figure  82.  Complexity  of  DFA  show  n  in  Figure  78  in  dimen¬ 
sions  of  K(DFA)  versus  tfaujty. 


Complexity  vs.  Fault  Location 


12  3  4 

Faulty  Transition  (t,au%) 


Figure  83.  Complexity  measure  of  system  shown  in  Figure  79 
in  dimensions  of  K<DFA) versus  tfaultr 


Brittleness  is  computed  asdefined  in  Definition 
8.  Performance  is  defined  based  upon  the  number 
of  accepted  strings  and  string  size.  The  ratio  of  the 
number  of  accepted  strings  to  total  string  size  is 
inversely  proportional  to  the  performance.  For  each 
possible  fault,  this  ratio  is  compared  to  a  consistent 
base  case  consisting  of  an  exponentially  growing 
number  of  accepted  words  as  the  string  size 


is  performed  on  the  more  compact,  or  truer  repre¬ 
sentation  of  the  complexity,  of  the  same  system.  A 
system  with  an  implementation  whose  size  is  closer 
to  its  true  complexity  is  more  brittle.  The  inverse 
relationship  between  complexity  and  brittleness 
holds  in  the  more  compact  system  (Figure  80)  as 
well. 
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3.  Kolmogorov  Complexity  as  a  Fundamental  Metric  Enabling  Vulnerability  Analysis 


Brittleness  vs.  Fault  Location 


Figure  84.  Brittle  measure  of  system  shown  in  Figure  79  in  di¬ 
mensions  of  A/J\  l/|  versus  tfaulty. 

An  impor  tant  result  in  this  exploration  of  the 
relationship  among  vulnerability,  complexity,  and 


brittleness  is  that  the  greater  the  dispersion,  the 
lower  the  brittleness.  This  suggests  that  larger  sys¬ 
tems,  requiring  traversal  of  larger  numbers  of  states 
and  transitionsto  reach  an  accepting  state,  or  target 
of  attack,  require  more  effort  to  successful  attack.  A 
system  that  has  a  large  amount  of  inherent  complex¬ 
ity  cannot  be  designed  more  compactly  than  the 
length  of  itsKolmogorovComplexity.  An  intelligent 
attacker  maybe  able  to  observe  an  inefficiently 
implemented  system  and  reduce  it  to  its  most  com¬ 
pact  form,  that  is,  itsKolmogorovComplexity,  thus 
easily  identifying  paths  of  attack  to  reach  specific  tar¬ 
gets.  A  truly  safe  system  isthusobtained,  not  by 
building  inefficiency  into  the  system,  but  rather,  by 
making  the  view  to  the  attacker  as  inherently  com¬ 
plex  as  possible. 
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4.  Active  Networks 


Active  Virtual  Network  Management  Prediction  enhance¬ 
ment  via  Kolmogorov  complexity  estimation 


KolmogorovComplexity  (K(x))  (see  [10]  for  an 
introduction  to  Kolmogorov  Complexity  and  [5], 
[6],  [8],  and  [9]  for  applications)  isthe  optimal 
compression  bound  of  string  x.  This  incomputable, 
yet  fundamental  property  of  information  has  vast 
implications  in  a  wide  range  of  applications  includ¬ 
ing  network  and  system  optimization,  security,  and 
Bioinformatics.  Active  networks  [3]  form  an  ideal 
environment  in  which  to  study  the  effects  of 
tradeoffs  in  algorithmic  and  static  information  rep¬ 
resentation  because  an  active  packet  consistsof  both 
code  and  static  data.  A  question  active  network 
application  developers  must  answer  is,  "What  isthe 
optimal  proportion  of  packet  content  that  should 
be  code  versus  data?”  A  method  for  obtaining  the 
answer  to  this  question  comes  from  direct  applica¬ 
tion  of  Minimum  Description  Length  (MDL)  ([10] 
and  [14])  to  an  active  packet.  LetZ^be  a  binary 
string  representing  x  Let  Hx  be  a  hypothesis,  in  algo¬ 
rithmic  form,  that  attempts  to  explain  howx  is 
formed.  MDL  states  that  the  sum  of  thelength  ofthe 
shortest  encoding  of  a  hypothesis  about  the  model 
generating  the  string  and  the  length  ofthe  shortest 
encoding  ofthe  string  encoded  by  the  hypothesis 
will  estimate  the  KolmogorovComplexity  of  string  x, 
K(x)  =*K(Hx)  +K(Dx\Hx).A  method  for  determining 
K(x)  separates  randomness  from  non-randomnessin 
x  by  incorporating  non-randomness,  which  is  com¬ 
putable,  as  the  shortest  encoded  program  that  repre¬ 
sents  the  original  string.  The  random  part  ofthe 
string  represents  the  error,  that  is,  the  difference 
between  the  original  string  and  the  output  ofthe 
encoded  program.  Thus,  the  goal  isto  minimize 
l(He)  +  l(Dx\He)  +  l(E)  where  l(x)  isthe  length  of 
string  x,  He  is  the  estimated  hypothesis  used  to 
encode  the  string  (Dx)  and  £  isthe  error  in  the 
hypothesis,  Dx-  (Dx  \  He).  The  more  accuratelythe 
hypothesis  describes  string  x,  the  shorter  the  encod¬ 
ing  ofthe  string.  An  active  packet  is  measured  as 
shown  in  Figure  85,  where  choosing  an  optimal  pro¬ 
portion  of  code  and  data  minimizes  the  packet 
length.  The  goal  isto  learn  howto  optimize  the 
combination  of  communication  and  computation 


Information  Density 


Length  (bytes) 

Figure  85.  Algorithmic  content. 

enabled  by  an  active  network.  Clearly,  if  is  estimated 
to  be  high  for  the  transfer  of  a  piece  of  information, 
then  the  benefit  of  having  code  within  an  active 
packet  is  minimal.  On  the  other  hand,  if  the  com¬ 
plexity  estimate  is  low,  then  there  is  great  potential 
benefit  in  including  it  in  algorithmic  form  within 
the  active  packet.  When  this  algorithmic  informa¬ 
tion  changes  often  and  impacts  low-level  network 
devices,  then  active  networking  provides  the  best 
framework  for  implementing  solutions  (a  specific 
example  of  separating  non-randomness  from  ran¬ 
domness,  although  not  explicitly  stated  as  such,  can 
be  found  in  mobility  management  as  discussed  in 
[3]  and  [11]). 

An  active  packet  that  has  been  reduced  to  the 
length  ofthe  best  estimate  ofthe  Kolmogorov  Com¬ 
plexity  of  the  information  it  transmits  will  be  called 
the  minimum  size  active  packet.  When  the  mini¬ 
mum  size  active  packet  is  executed  to  regenerate 
string  x,  theZ>x  |  He  portion  ofthe  packet  predicts  x 
using  static  data  ()  to  correct  for  inaccuracy  in  the 
estimated  hypothesis.  There  are  interesting  relation¬ 
ships  between  KolmogorovComplexity,  prediction, 
compression  and  the  Active  Virtual  Network  Man¬ 
agement  Prediction  (AVNMP)  mechanism 
described  in  [3].  These  relationships  are  discussed 
and  experimentally  validated  throughout  this  paper. 
The  next  section  provides  an  overview  of  AVNMP 
before  discussing  its  relationship  to  Kolmogorov 
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Complexity2.  After  required  relevant  background  on 
AVNMP  isexplained,  the  relationship  to  Complexity 
Theory  is  developed  beginning  from  a  high  level 
overview,  then  driving  down  into  detailed  relation¬ 
ships  and  experimental  results. 

4.1  ACTIVE  VIRTUAL  NETWORK 

MANAGEMENT  PREDICTION  OVERVIEW 

The  Active  Virtual  Network  Management  Prediction 
(AVNMP)  architecture  provides  a  network  predic¬ 
tion  service  that  utilizes  the  capability  of  Active  Net¬ 
works  to  easily  inject  fine-grained  models  into  the 
communication  network  to  enhance  network  per¬ 
formance.  AVNMP,  injected  into  the  network  as  an 
active  application,  is  capable  of  modeling  load  and 
propagating  state  information  in  a  manner  that 
meets  the  demand  for  accuracy  at  a  particular  active 
node.  Greater  demand  for  prediction  accuracy  is 
met  at  the  cost  of  AVN  M  P  performance,  that  is,  the 
ability  of  AVNMP  to  predict  farther  into  the  future. 
While  this  paper  focuses  on  network  traffic  and  load 
prediction,  an  AVNMP  application  to  predict  CPU 
utilization  for  Active  Networks  in  collaboration  with 
National  Institute  of  Standards  and  Technology 
([4],  [12]  and  [13])  has  been  demonstrated.  The 
inherently  distributed  nature  of  communication  net¬ 
works  and  the  computational  power  unleashed  by 
the  Active  Networking  paradigm  have  been  used  to 
mutual  benefit  in  the  development  of  the  Active  Vir¬ 
tual  Network  Management  Prediction  mechanism. 
Active  Networks  benefitfrom  AVNMP  by  continu¬ 
ously  receiving  information  about  potential  prob¬ 
lems  before  they  occur. 

AVNMP  benefits  from  Active  Networksin  many 
ways.  The  first,  and  most  practical  way,  isthe  ease  of 
development  and  deployment  of  this  novel  predic¬ 
tion  mechanism.  This  could  not  have  been  accom¬ 
plished  so  quickly  or  easilygiven  today's  closed, 
proprietary  network  device  processing.  Another 
benefit  isthe  fact  that  network  packets  now  have  the 
unprecedented  abilityto  control  their  own  process¬ 
ing.  Great  advantage  was  taken  of  this  new  capability 
in  AVNMP.  Virtual  messages,  varying  widely  in  con¬ 
tent  and  processing,  can  adjust  their  predicted  val¬ 
ues  as  they  travel  through  the  network.  Finally, 

Active  Networks  add  a  level  of  robustness  that  can¬ 
not  be  found  in  today's  networks.  This  robustness  is 
due  to  the  ability  of  AVNMP  system  components, 
which  are  active  packets,  to  easily  migrate  from  one 


node  to  another  in  the  event  of  failure -or  the  pre¬ 
diction  of  failure  provided  by  AVNMP  itself. 

The  desired  characteristics  of  AVNMP  are  large  a 
Lookahead  time,  high  prediction  accuracy,  low  over¬ 
head  and  robust  operation.  Each  of  these  character¬ 
istics  is  inter-related  and  a  suitable  tradeoff  needs  to 
be  determined  during  configuration  of  the  system. 
The  AVNMP  experimental  validation  configuration 
for  the  initial  test  discussed  in  this  paper  is  a  feed 
forward  network  consisting  of  a  host  containing  the 
Driving  Process  and  four  intermediate  active  net¬ 
work  nodes  containing  Logical  Processes  as  shown 
in  Figure  86.  AH-1  and  AH -2  are  host  nodesand  AN- 


Figure  86.  Experimental  configuration. 


1  through  AN -5  are  active  network  nodes.  The  edges 
between  the  nodes  represent  links  between  the 
labeled  ports  on  each  node.  All  nodes  are  Sun 
Spares  running  the  M  agician  active  network  execu¬ 
tion  environment.  The  AVNMP  system  parameters 
were  configured  as  shown  in  Table8.  In  this  experi¬ 
ment  AVN  M  P  is  predicting  the  packet  input  and  out¬ 
put  rate  for  each  link  at  each  node,  from  an 
application  residing  on  AH  -1  that  istransmitting  an 
active  audio  packets. 


Table  8  AVNM  P  Parameters 


Sliding  Window  Loo-  200  seconds 
kahead  Length 

Virtual  Message  Gen-  0.5  virtual  messages/millisec- 
eration  Rate  ond 


Virtual  Message  Step  20  seconds 
Size 


Tolerance  500  Messages/second 

(reduced  by  half  periodically) 

Ratio  of  Virtual  to  Real  1  virtual  message/real  message 
Messages 


2.  Current  project  progress  and  experimental  code  is  maintained  in  http:/  /  www.research.ge.com/  -bushsf/  ftn. 
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The  State  Queue  plot,  Figure  87,  shows  the  pre¬ 
dicted  traffic  load  values  cached  in  the  State  Queue 
as  a  function  of  LVT  and  Wallclock.  AsWallclock 
approaches  any  given  Local  Virtual  Time,  the  pre¬ 
dicted  load  values  converge  towards  the  actual  load. 
The  general  operation  is  illustrated  in  the  next  five 
graphs  where  all  measurements,  unless  otherwise 
indicated,  are  from  node  AN-4.  These  curves  vali¬ 
date  intuitive  trends  in  the  operation  of  AVNMP. 
Figure  88  shows  the  reduction  in  tolerance  versus 
time  that  is  pre-programmed  into  each  Logical  Pro¬ 
cess.  The  Y-axis  i  s  th  e  to  I  eran  ce  th  at  i  s  d  em  an  ded 
between  the  predicted  value  and  the  actual  value  of 
an  SNMP  packet  counter.  This  value  is  decreased 
purposely  in  this  experiment  in  order  to  create  a 
greater  demand  over  time  for  accuracy  and  thus  cre¬ 
ate  a  challenging  validation  of  the  AVNMP  system 
under  gradually  increasing  stress.  In  Figure  89  the 
proportion  of  out-of-tolerance  messages  is  shown  as 
a  function  of  Wallclock.  The  Y-axis  isthe  proportion 
of  messages  that  arrived  at  a  specific  node  out  of  tol¬ 
erance,  that  is,  the  actual  value  exceeded  the  pre¬ 
dicted  value  by  an  amount  greater  than  the 
tolerance  setting.  AsWallclock  progresses,  the  toler¬ 
ance  is  purposely  reduced  causing  a  greater  likeli¬ 
hood  of  messages  exceed  i  n  g  th  e  to  I  eran  ce.  T  h  i  s  i  s 
done  in  order  to  validate  the  performance  of  the  sys¬ 
tem  as  stress,  in  the  form  of  greater  demand  for 
accu  racy,  i  s  i  n  creased .  F  i  gu  re  90  sh  o ws  th  e  p  red  i  c- 
tion  error  as  a  function  of  Wallclock.  The  Y-axis  is 
the  difference  in  the  number  of  packets  received 
versus  the  number  of  packets  predicted  to  have  been 
received.  This  graph  verifies  that  the  system  is  pro¬ 
ducing  more  accurate  predictions  as  the  demand  for 
accuracy  increases.  H  owever,  the  Y-axis  of  Figure  91 
shows  the  Lookahead  decreasing  versus  Wallclock. 
The  expected  Lookahead  time  isthe  difference 
between  Wallclock  and  the  Local  Virtual  Time  at  a 
particular  node.  The  demand  for  greater  accuracy 
reduces  the  distance  into  the  future  that  the  system 
can  predict.  Finally,  in  Figure  92,  speedup,  the  ratio 
of  virtual  time  to  Wallclock  of  the  real  system,  is 
shown  as  a  function  of  Wallclock.  The  speedup  is 
reduced  as  the  demand  for  accuracy  is  increased.  As 
previously  mentioned,  only  for  purposes  of  this 
experiment,  the  tolerance  is  being  reduced  as 
Wallclock  progresses,  causing  the  accuracy  to 
increase  while  loosing  performance  in  terms  of 
speedup  and  Lookahead. 


Figure  87.  State  queue. 
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Figure  88.  Tolerance  setting  decreases  as  wallclock  in¬ 
creases  thus  demanding  greater  accuracy. 


Measured  Prop.  Out-of-Tolerance  performance 
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Figure  89.  Demand  for  greater  accuracy  causes  the  propor¬ 
tion  of  out-of-tolerance  messages  to  increase. 

AVNMP  overhead 

AVNMP  has  the  potential  to  generate  two  forms  of 
overhead,  processing  overhead  and  bandwidth  over¬ 
head.  If  the  predicted  results  are  within  the  user 
specified  error  tolerance  and  the  user  fully  utilizes 
the  predicted  results,  then  overhead  is  at  a  mini¬ 
mum.  The  question  of  overhead  versus  benefit 
becomes  one  that  depends  upon  the  perceived  util¬ 
ity  of  predictive  capability  and  depends  significantly 
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4.1  Active  virtual  network  management  prediction  overview 
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Figure  90.  Predictions  become  more  accurate... . 
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Figure  93.  Number  of  virtual  messages  versus  wallclock. 
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Figure  91.  ...  atthe  expense  of  lookahead... . 
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Figure  94.  Expected  task  execution  time  as  a  function  of 
wallclock. 
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Figure  92.  ...  and  speedup. 

upon  the  manner  and  application  in  which  it  is 
used.  It  isthe  author's  belief  that  load  and  process¬ 
ing  prediction  are  of  particularly  great  importance 
in  Active  Networks  where  routing  isbased  upon  not 
only  load,  but  the  processing  capability  required  by 
active  applications.  In  this  section,  the  load  predic¬ 
tion  application  example  iscontinued  with  overhead 
results  displayed  in  terms  of  processing  time  and 
n  u m ber  of  p ackets  tran sm i tted  .The  expected  A N  E  P 
[3]  packet  size  measured  during  the  test  was  1000 
bytes. 


Task  execution  time  and  message  overhead 

The  task  execution  time  isthe  Wallclock  time  the 
system  spends  executing  a  non-rollback  message.  It 
was  expected  that  task  execution  time  would  be 
essentially  constant;  however,  it  increasesin  direct 
proportion  to  the  number  of  rollbacks  as  shown  in 
Figure  94.  Thisiscaused  by  the  lack  of  fossil  collec¬ 
tion.  The  increase  in  the  number  of  values  in  the 
State  Queueiscausingaccessofthe  State  Queueand 
MIB  to  slow  in  proportion  to  the  queue  size. 

F  i  gu  re  93  d  i  sp  I  ays  th  e  n  u  m  ber  of  vi  rtu  al  messages 
versus  Wallclock  and  Figure  95  displays  the  total 
number  of  anti-messages.  This  is  expected  to 
increase  over  time.  This  value  is  reset  every  time  the 
tolerance  istightened  (every5  minutesin  thiscase). 

AVNMP  robustness 

AVNMP  consistsof  two  main  typesof  active  packets: 
AvnmpLP,  which  is  the  Logical  Process,  and  Avnmp- 
Packet,  which  isthe  virtual  message.  If  an  AvnmpLP 
packet  is  dropped,  the  destination  node  will  not 
have  the  capability  to  work  forward  in  time  or  for¬ 
ward  virtual  messages.  Thus,  AVNMP  features  will 
not  be  available  on  the  node  and  accuracy  of  other 
nodes  may  be  reduced.  If  an  AvnmpPacket\s  dropped 
or  unexpectedly  delayed,  accuracy  will  be  reduced 
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Figure  95.  Number  of  anti-messages  versus  w  allclock. 

becau  se  th  e  State  Q  u  eu  es  of  d  o  wn  stream  n  od  es  wi  1 1 
lack  a  predicted  value.  H  owever,  AVNM  P  will  con¬ 
tinue  to  operate.  In  the  next  section  the  role  of  com¬ 
plexity  in  understanding  prediction  isdiscussed. 

4.2  TOWARDS  COM  PLEXITY 

AVNM  P  can  provide  early  warning  of  potential  prob¬ 
lems;  however,  the  identification  of  a  solution  and 
marshaling  of  automated  solution  entities  within  an 
active  network  has  not  yet  been  fully  addressed.  This 
project  has  begun  to  lay  the  groundwork  for  such 
automated  composition  of  management  solutions 
within  an  active  network  [3],  This  direction  is  being 
carried  forward  by  exploration  of  a  relatively  unex¬ 
plored  area  -understanding  the  benefits  of  active 
networking,  Algorithmic  Information  Theory,  and 
its  close  companion,  ComplexityTheory.  To  our 
knowledge,  thiswork  isthefirst to  propose  and 
begin  investigation  into  the  newly  available  process¬ 
ing  power  of  Active  Networks  through  the  concept 
of  Complexity  and  Algorithmic  Information  ("Strep- 
tichrons")  as  shown  in  Figure  96.  Legacy  networks, 
which  are  today's  passive  networks,  have  been 
designed  to  optimize  transmission  of  passive  data 
using  bit  compression  based  upon  the  underlying 
notion  of  Shannon  Entropy.  AVNM  P  has  shown  that 
active  networks allowfor  the  possibilityof  executable 
models  and  that  the  corresponding  information 
packets  might  be  best  studied  with  Kolmogorov 
Complexity  as  the  underlying  theory.  It  is  serendipi¬ 
tous  that  ComplexityTheory  has  been  receiving 
more  attention  lately  and  is  making  significant  theo¬ 
retical  progress  at  the  same  time  that  research  into 
active  networking  istaking  place.  Active  networks 
provide  a  new  paradigm  and  enhanced  capabilities, 
which,  when  combined  with  ideas  from  Algorithmic 
Information  Theory  [10],  might  lead  to  superior, 
innovative  solutionsto  problems  of  network  man¬ 
agement.  One  possible  approach  proposes  to  com- 
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Figure  96.  Active  networks  and  legacy  networks  as  viewed 
byAVNMP. 

bine  Kolmogorov  Complexity  with  the  science  of 
Algorithmic  Information  Theory(sometimes called 
ComplexityTheory)  to  build  self-managed  networks 
thatdrawon  fundamental  propertiesof  information 
to  identify,  analyze,  and  correct  faults,  as  well  as 
security  vulnerabilities,  in  a  distributed  information 
system  [8], [9].  Specifically,  we  suspect  that  complex¬ 
ity  measures  can  be  used  to  detect  and  analyze  prob¬ 
lems  in  a  network,  and  to  facilitate  techniques  to 
remedy  network  faults.  We  also  envision  that  Kol¬ 
mogorov  Complexity  can  be  applied  directlyto 
improve  the  performance  of  AVNM  P.  In  [5]  and  [6] 
the  concept  of  monitoring  the  change  in  Kolmog¬ 
orov  Complexity  of  a  system  was  first  introduced  for 
Information  Assurance. 

According  to  ComplexityTheory,  the  complexity 
of  an  information  unit  isthe  size  of  the  smallest  pro¬ 
gram  capable  of  producing  the  unit.  Similarly,  Algo¬ 
rithmic  Information  Theory  defines  the  complexity 
of  an  information  unit  to  be  the  unit's  length  (after 
the  unit  has  been  compressed  to  the  maximum 
extent  possible).  These  two  views  can  be  related 
through  theory.  In  general,  complexity  is  not  com¬ 
putable;  however,  the  boundson  complexity  tighten 
continuously asfundamental  research  in  Kolmog¬ 
orov  Complexity  progresses.  For  example,  the  Mini¬ 
mum  Data  Length  (MDL)  [14]  estimate  for 
Kolmogorov  Complexity  considers  that  the  best 
measure  for  complexity  of  an  information  unit  mini¬ 
mizes  the  sum  of  the  length  of  the  description  of  a 
theory  that  produces  the  unit  and  the  length  of  the 
unit  encoded  using  the  theory.  In  thissection,  we 
use  MDL  as  one  approach  to  estimate  Kolmogorov 
Complexity,  and  we  suggest  its  application  as  a 
means  to  improve  the  performance  of  AVNM  P. 
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One  potential  drawback  to  AVNMP,  gently 
pointed  out  earlier  in  thispaper,  is  fact  that  AVNMP 
itself  consumes  resources  in  an  effort  to  predict 
resource  usage  in  a  network.  Resource  consumption 
by  AVNMP  istied  directly  to  accuracy:  higher  accu¬ 
racy  costs  more  in  terms  of  bandwidth  utilization, 
associated  with  simulation  rollbacks  and  the  con¬ 
comitant  transmission  of  anti-messages.  Despite  this 
relationship,  potential  exists  to  nearly  reach  the  the¬ 
oretical  minimum  amount  of  bandwidth  to  achieve 
the  maximal  model  accuracy.  This  possibility  arises 
because  AVNMP  consistsof  many  small,  distributed 
models  (each  a  description  of  a  theory)  that  work 
together  in  an  optimistic,  distributed  manner  via 
message  passi ng  (data) .  Each  AVN M  P  model  can  be 
transferred,  using  Active  Networks,  asa  Streptichron 
[3],  which  is  any  message  that  contains  an  execut¬ 
able  model  in  addition  to  data.  Using  Streptichrons, 
the  optimal  mix  of  data  and  model  can  be  transmit¬ 
ted  to  closely  approximate  the  minimum  MDL. 
Achieving  maximal  model  accuracy  at  minimal 
bandwidth  provides  the  best  AVNMP  accuracy  at  the 
least  cost  in  AVNMP  resource  consumption. 

Other  possibilities  exist  to  exploit  Kolmogorov 
Complexity  to  improve  AVNMP  performance.  For 
example,  one  can  apply  the  MDL  technique  to  the 
rollback  frequency  of  all  the  AVNMP  enhanced 
nodes  in  a  network.  A  low  rollback  complexity 
(which  suggests  a  high  compressibility  in  the 
observed  data)  would  indicate  patterns  in  the  roll¬ 
back  behavior  that  could  be  corrected  relatively  eas¬ 
ily  bytuning  AVNMP  parameters.  High  complexity 
(low  compressibility)  would  indicate  the  lack  of  any 
computable  patterns,  and  would  suggest  that  little 
performance  improvement  could  be  achieved  by 
si  m  p  I  y  tu  n  i  n  g  p  ar  am  eters.  T  h  u  s,  we  h  yp  oth  esi  ze  th  at 
our  tuning  gradient  should  be  guided  toward 
regions  of  high  complexity,  which  suggests  that  we 
can  tune  parameters  to  improve  the  rollback  fre¬ 
quency.  The  next  section  focusesupon  experimental 
results  relating  prediction  to  complexity  gathered 
from  the  operation  of  the  AVN  M  P  system. 

4.3  AVNMP  AND  KOLMOGOROV 
COM  PLEXITY 

In  AVNMP,  information  that  impacts  the  network  is 
transmitted  based  upon  prediction  at  a  low  level 
within  the  network.  Thus,  AVNMP  allows  experi¬ 
mentation  in  defining  the  boundaries  within  which 
active  networking  is  beneficial.  In  Figure  97  an 
active  and  passive  form  of  AVN  M  P  is  represented. 


The  passive  case  is  represented  in  the  upper  portion 
of  the  figure.  I  n  the  passive  case,  actual  data  (Dx)  is 
observed  at  the  Driving  Process.  A  hypothesis  is 
formed  about  the  data,  and  predicted  data  (Dy)  is 
generated  in  the  form  of  static  virtual  messages.  The 
term  static  indicates  that  information  content  within 
the  message  contains  no  executable  code.  When 
error  in  the  hypothesis  exceeds  a  preset  threshold, 
AVNMP  causes  roll  backs  to  occur  in  order  to  adjust 
for  the  inaccuracy.  In  the  lower  portion  of  Figure  97, 


Active 


Figure  97.  Active  versus  passive  form  of  AVNM  P. 


the  hypothesis  is  included  within  each  packet  and  is 
used  to  encode  within  the  code  portion  of  the  active 
packet. 

What  is  the  relationship  between  the  estimated 
operating  hypothesis  (He)  in  the  AVNMP  packet 
encoding  and  as  the  predictor  in  the  Driving  Pro¬ 
cess?  First,  they  are  the  same  hypothesis.  Second,  it 
has  been  shown  [10]  that  the  shorter  the  packet,  the 
better  the  predictor.  Conversely,  the  worse  the  pre¬ 
diction,  the  longer  the  value  is  within  the  AVNMP 
packet  encoding.  Can  Active  Virtual  Network  Man¬ 
agement  Prediction  benefit  from  the  fact  that  the 
smallest  algorithmic  form  is  also  the  most  likely  pre¬ 
dictor  of  a  sequence?  This  can  come  about  because 
Driving  Processes  and  Streptichrons  (active  virtual 
messages  anticipating  events  in  the  future)  benefit 
by  being  both  small  and  accurate  as  shown  in 
Figure  98.  The  objective  is  to  increase  the  rate  of 
convergenceof  the  predictions  held  within  theState 
Queue  to  con  verge  to  the  actual  value  that  will  occur 
in  the  future,  and  to  converge  to  the  value  before  it 
actually  exists.  Actual  and  predicted  values  within  a 
particular  instance  of  a  State  Queue  were  shown  in 
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Figure  98.  Better  Prediction  Implies  Smaller  Packets  Implies 
Better  AVNMP  Performance  Implies  Better  Prediction. 
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Figure  99.  Load  Prediction  Hypothesis. 
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Figure  87.  Let  usexamine  A  VNMP  results  in  light  of 
complexity  in  more  detail  in  the  next  section. 

Load  prediction  and  complexity  in  active 
virtual  network  management  prediction 

Our  Active  Network  KolmogorovComplexity estima¬ 
tor  is  currently  implemented  as  a  quick  and  simple 
compression  estimation  method.  It  returns  an  esti¬ 
mate  of  the  smallest  compressed  size  of  a  string.  It  is 
based  upon  computing  the  entropy  of  the  weight  of 
ones  in  a  string.  Specifically  it  is  defined  in 
Equation  12 

(k(x)  ~  l(x)Hii#f fbo) + log2(/w)  ( 12) 

where  x#l  isthe  number  of  1  bits  and  x#0  isthe 
number  of  0  bits  in  the  string  whose  complexity  isto 
be  determined.  Entropy  is  defined  in  Equation  13. 

Hip)  =  -p\og2p-(\.0-p)\og2p-{\.0-p)  (13) 

See  [7]  for  other  measures  of  empirical 
entropy  and  their  relationship  to  Kolmogorov 
complexity.  The  expected  complexity  is  asymp¬ 
totically  related  to  entropy  as  shown  in 
Equation  14. 

H(X)~  V  P(X=x)C(X)  (14) 

l(xT=  n 

Load  prediction  data  sampled  from  execution  of 
AVNMP  is  analyzed  relative  to  several  hypotheses. 
The  goal  isto  use  a  simple  example  to  demonstrate 
the  relationship  among  accuracy  of  hypotheses, 
complexity  and  compression.  The  initial  hypothesis 
(regard  I  ess  of  naivete  in  choice  of  hypothesis)  is  that 
the  data  can  be  characterized  by  a  simple  linear 


Figure  100.  Simple  AVNM  P  Hypotheses  for  Load  Prediction. 

extrapolation  based  upon  the  last  sampled  load  val¬ 
ues.  This  is  shown  in  Figure  99  where  the  gray  boxes 
are  actual  load  samples  and  the  black  stars  are  pre¬ 
dicted  load  samples.  Note  that  the  predicted  load  is 
based  upon  a  short  history  shown  in  the  graph  as  the 
initial  match  between  predicted  and  actual  load. 

Various  enhancements  are  added  to  the  initial 
hypothesis.  In  this  specific  case,  a  running  average 
was  used  to  smooth  the  data  before  the  extrapola¬ 
tion.  The  size  of  the  running  average  defines  a 
hypothesis.  Each  enhancement  is  considered  a  new 
hypothesis  (He)  in  this  experiment.  In  Figure  100, 
for  each  the  sum  of  the  error  in  predictions  is 
graphed  as  the  gray  boxes  in  the  lower  portion  of 
the  graph.  The  compressed  size  of  the  correspond¬ 
ing  error  is  plotted  as  the  black  stars  in  the  upper 
portion  of  the  figure.  Clearly  a  better  hypothesis 
concerning  the  origination  of  the  data  results  in  bet¬ 
ter  prediction  and  greater  compression,  while  poor 
hypotheses  result  in  inaccurate  prediction  and 
reduced  compression.  This  provides  a  concrete 
demonstration  of  the  relation  between  complexity 
and  prediction  accuracy. 

It  is  hypothesized  that  the  greater  the  complexity, 
the  greater  the  error  in  prediction,  and  thusthe 
greater  the  likelihood  of  AVNMP  rollback.  In  order 
to  validate  this  hypothesis,  load  prediction  error 
from  AN-1  (see  the  experimental  configuration 
shown  in  Figure  86)  within  the  network  iscompared 
with  the  estimated  complexity  of  the  actual  load.  In 
Figure  101  the  load  prediction  error  is  plotted  with 
th  e  esti  m  ated  co  m  p  I  exi  ty  versus  Wal  I  cl  ock  wh  ere  val  - 
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ues  are  taken  over  intervalsof  the  same  length  as  the 
Sliding  Lookahead  Windowshown  in  Table  8. 
Larger  error,  and  thus  more  likely  rollback,  occurs 
during  periods  of  relative  high  complexity,  while 
complexity  is  low  during  periods  of  low  error. 
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Figure  101.  Estimated  Complexity  and  Error  w  ithin  AVN  M  P. 


Prediction  convergence  and  complexity 

Predictions  within  the  State  Queue  form  asequence 
that  AVNMP  istrying  to  predict.  This  is  represented 
in  more  detail  in  Figure  102.  The  goal  of  the  roll¬ 
back  mechanism  isto  cause  the  predicted  values  to 
converge  to  the  best-predicted  estimate.  I  n  AVN  M  P, 
the  Driving  Process  isthe  model,  or  MDL  hypothe¬ 
sis.  The  virtual  messages  generated  by  the  Driving 
Processes  may  be  active,  containing  small  hypotheses 
within  themselves  as  previously  discussed  in  the  bot¬ 
tom  of  Figure  97. 
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Figure  102.  Converging  Predictions. 


The  definition  of  Global  Virtual  Time,  GVT(  t), 
can  be  applied  to  reason  about  the  information  con¬ 
tained  in  the  State  Queue.  Consider  task  execution 
time  (xtask),  which  isthe  time  taken  by  a  logical  pro¬ 
cess  to  generate  a  predicted  value  given  an  input 
message,  the  Walldock  time  at  which  a  particular 
state  was  cached  (tSq),  and  Walldock  time  (t).  Let 
PSQbe  the  predicted  time  that  event  SQ will  occur. 
Lety(x)  be  the  prediction  hypothesis  of  a  Driving 


Process  such  that/x)  predicts  a  value  for  timex 
wherex  >  t.  Consider  a  predicted  value  (Vv)  that  is 
cached  at  time  in  the  State  Queue  resulting  from  a 
particular  predicted  event.  Asrollbacksoccur,  values 
for  a  particular  predicted  event  may  change,  con¬ 
verging  to  the  real  value  (Vr).  For  correct  operation 
of  Active  Virtual  Network  Management  Prediction, 
should  approach  as  t approaches  GVT[t).  Explicitly, 
this  is  Ve  >035 :0<|y(r)  -  j(GVT(t))  |  <e  implies  that 
0<\(GVT(t))  -  t\ <8 where/ 1)  =Vrand  j(GVT(t))  = 
Vv  Because  Active  Virtual  Network  Management 
Prediction  always  uses  the  correct  value  when  the 
predicted  time(/>)  equalsthe  Walldock  (t)  and  it  is 
assumed  that  the  predictions  become  more  accurate 
as  the  predicted  time  of  the  event  approaches  the 
current  time,  the  reasonable  assumption  is  made 
that \\mt^pj(t)  =  Vr  In  order  for  the  Active  Virtual 
Network  Management  Prediction  system  to  always 
look  ahead,  Vr:  GVT(t)  >  t. This.meansthat 
V« £  {LP}tE.LVTlp  (r)ar  and  —  e{M}a?where  m 
isthe  receive  time  of  a  messag^Misthe  set  of  mes¬ 
sages  in  the  entire  system  and LVTlp  isthe  LVT  of 
the  nth  Logical  Process.  I  n  other  words,  the  Local 
Virtual  Time  of  each  process  must  be  greater  than  or 
equal  to  Walldock  and  the  smallest  message  not  yet 
processed  must  also  be  greater  than  or  equal  to 
Walldock.  The  smallest  message  could  cause  a  roll¬ 
back  to  Walldock.  This  implies  that 
Vn,t:LVTdp  (t)at.  In  other  words,  this  implies  that 
the  Local  Virtual  Time  of  each  driving  process  must 
be  greater  than  or  equal  to  Walldock.  An  out-of- 
order  rollback  occurs  when  m  <LVT[  t) .  The  largest 
saved  state  time  such  that  PSQ<misused  to  restore 
the  state  of  the  Logical  Process,  where  .P^  isthe 
time  the  state  was  predicted  to  occur.  Then  the 
expected  task  execution  time  (xtask)  can  take  no 
longer  than  Ptask  -  t to  complete  in  order  for  GVT(  t) 
to  remain  ahead  of  Walldock.  Thus,  a  constraint 
between  expected  task  execution  time  (xtos/;) ,  the 
predicted  time  associated  with  a  state  value  (PSq), 
and  Walldock  (t)  has  been  defined.  AsHe  improves 
there  will  be  a  reduction  in  the  number  of  rollbacks, 
a  smaller  value  in  the  packet  encoding,  and  shorter 
Streptichrons. 

Self-regulation  via  complexity 

As  predictions  become  more  inaccurate  in  AVNMP, 
virtual  messages  should  slowdown,  rather  than  bur¬ 
den  the  system  with  potential  rollbacks.  Poorly  pre¬ 
dicted  messages  will  naturally  be  larger  in  their 
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minimum  size,  which  slows  down  their  rate  of  propa¬ 
gation  in  proportion  to  their  inaccuracy. 

Another  issue  concerns  a  mechanism  for  feed¬ 
back  to  the  Driving  Process  in  order  to  improve. 
Such  a  feedback  mechanism  can  be  based  upon 


input  from  the  complexity  estimate,  or  minimum 
encoded  packet  size,  of  virtual  messages.  The 
hypothesis  is  adjusted  in  a  manner  that  drives  the 
system  towards  minimizing  encoded  virtual  message 
size. 
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Active  Virtual  Network  Management  Prediction 
(AVNMP)  [16],  provides  predicted  state  within  each 
node  of  a  network  based  upon  a  correct  estimated 
operating  hypothesis,  He.  The  AVNMP  architecture 
provides  a  network  prediction  service  that  utilizes 
the  capability  of  Active  Networks  to  easily  inject  fine¬ 
grained  models  into  a  communication  network  to 
enhance  network  performance.  AVNMP,  injected 
into  the  network  as  an  active  overlay  network,  is  a 
simulation  of  the  actual  network,  but  running  tem¬ 
porally  ahead  of  the  actual  network.  AVNMP  iscapa- 
ble  of  modeling  load  and  propagating  state 
information  in  a  manner  that  meets  the  demand  for 
prediction  accuracy  at  a  particular  active  node  at  the 
expense  of  overhead  due  to  rollback  in  order  to  cor¬ 
rect  for  prediction  inaccuracy.  Thus,  prediction 
accuracy  is  met  at  the  cost  of  AVN  M  P  performance, 
that  is,  the  ability  of  AVNMP  to  predict  farther  into 
the  future.  An  AVNMP  application  to  predict  CPU 
utilization  for  Active  Networks  has  been  demon¬ 
strated  in  collaboration  with  National  Institute  of 
Standardsand  Technology  in  [17],  [35]  and  [36], 
The  inherently  distributed  nature  of  communica¬ 
tion  networks  and  the  computational  power 
unleashed  by  the  Active  Networking  paradigm  have 
been  used  to  mutual  benefit  in  the  development  of 
the  Active  Virtual  Network  Management  Prediction 
mechanism.  Active  Networks  benefit  from  AVNMP 
by  continuously  receiving  information  about  poten¬ 
tial  problems  before  they  occur.  In  this  paper,  the 
groundwork  is  laid  for  using  a  KolmogorovCom- 
plexity  estimate  to  drive  the  optimal  generation  and 
composition  of  solutions.  System  faults  are  repre¬ 
sented  in  algorithmic  form.  Reversible  code  isthen 
developed  to  remove  the  effect  of  faults  in  a  system. 
The  application  in  this  paper  focuses  on  an  active 
network  in  which  information,  algorithmic  and 
static,  can  be  transmitted  in  a  fine-grained  manner. 

KolmogorovComplexity  (K(x))  (see  [30]  for  an 
introduction  to  Kolmogorov  Complexity  and  [18], 
[21]  [28]  and  [29]  for  applications)  isthe  optimal 
compression  bound  of  string  x.  This  incomputable, 
yet  fundamental  property  of  information  has  vast 
implications  in  a  wide  range  of  applications  includ¬ 
ing  network  and  system  optimization,  security,  and 
Bioinformatics.  An  active  network  [16]  provides  a 
suitable  environment  in  which  to  study  the  effects  of 
tradeoffs  in  algorithmic  and  static  information  rep¬ 


resentation  because  an  active  packet  consistsof  both 
code  and  static  data.  An  active  network  enables  pack¬ 
ets  to  perform  computation  at  the  intermediate 
nodes  in  addition  to  its  communication  capabilities. 
Thisenables  application  developers  to  design  novel 
network  services  and  protocolsthatcan  trade-off 
communication  and  computation  as  the  packet 
traverses  the  network.  Therefore,  to  ensure  best  per¬ 
formance,  active  network  developers  have  to  effec- 
tivelyanswer  the  question,  "What  isthe  optimal 
proportion  of  the  code  size  in  a  packet  with  respect 
to  its  data  payload?"  A  method  for  obtaining  the 
answer  to  this  question  comes  from  direct  applica¬ 
tion  of  a  technique  called  Minimum  Description 
Length  Minimum  Description  Length  (MDL) 

[37] ,[30]  to  an  active  packet.  Let  be  a  string  of  bits 
representing  x.  Let  be  a  hypothesis,  or  algorithm, 
that  attempts  to  explain  howisformed.  MDL  states 
that  the  sum  of  the  length  of  the  shortest  encoding 
of  a  hypothesis  about  the  model  generating  the 
string  and  the  length  of  the  shortest  encoding  of  the 
string  encoded  by  the  hypothesis  will  estimate  the 
KolmogorovComplexity  of  string  x.  A  method  for 
determining  separates  randomness  from  non-ran- 
domnessin  x  by  incorporating  non-randomness, 
which  iscomputable,  as  the  shortest  encoded  pro¬ 
gram  that  represents  the  original  string.  The  ran¬ 
dom  part  of  the  string  represents  the  error,  that  is, 
the  difference  between  the  original  string  and  the 
output  of  the  encoded  program.  Thus,  the  goal  isto 
minimize  where  isthe  length  of  string  x,  isthe  esti¬ 
mated  hypothesis  used  to  encode  the  string  ()  and  is 
the  error  in  the  hypothesis.  The  more  accurately  the 
hypothesis  describes  string  x,  the  shorter  the  encod¬ 
ing  of  the  string.  An  active  packet  is  measured  as 
shown  in  Figure  103,  where  each  packet  conveys  the 
same  information;  however,  the  length  varies  with 
the  choice  of  the  proportion  of  code  and  data.  This 
in  turn  isgoverned  bythe  hypothesis  chosen  to  rep¬ 
resent  the  data.  The  better  the  hypothesis,  the  lesser 
the  "error"  made  in  representing  the  data.  In  the  fig¬ 
ure,  H 1  isthe  worst  hypothesis  as  the  "error"  and 
hence  the  packet  size  isthe  largest.  On  the  other 
hand,  H  4  presents  the  best  hypothesis  for  the  data 
and  hence  the  packet  size  isthe  smallest.  The  goal  is 
to  learn  howto  optimize  the  combination  of  com¬ 
munication  and  computation  enabled  by  an  active 
network.  Clearly,  if  is  estimated  to  be  high  for  the 
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Figure  103.  Algorithmic  content. 

transfer  of  a  piece  of  infor  mation,  then  the  benefit 
of  having  code  within  an  active  packet  is  minimal. 

On  theother  hand,  ifthecomplexityestimateislow, 
then  there  is  great  potential  benefit  in  including  it 
in  algorithmic  form  within  the  active  packet.  When 
this  algorithmic  information  changes  often  and 
impacts  low-level  network  devices,  then  active  net¬ 
working  provides  the  best  framework  for  implement¬ 
ing  solutions.  In  the  next  section,  the  relationship 
among  AVNMP,  Kolmogorov  Complexity,  and  fault 
behavior  are  examined  in  more  detail. 


However,  a  fault  occurring  in  the  system  will  appear 
as  a  deviation  from  the  non-fault-operating  hypothe¬ 
sis.  The  fault  will  induce  the  appearance  of  greater 
randomness,  or  higher  Kolmogorov  Complexity 
because  actual  events  will  not  fitthe  initial  estimated 
hypothesis  (He).  This  will  cause  an  increase  in  roll¬ 
back  frequency  and  a  longer  value  in  the  encoded 
packets.  Most  Bayesian  Belief  Networks  take  the 
opposite  approach  by  developing  fault  hypotheses 
(Hj)  rather  than  a  correct  operating  hypothesis. 
Clearly  an  approach  based  upon  handcrafting  fault 
hypotheses  assumes  one  can  predetermine  and  char¬ 
acterize  all  possible  faults,  a  large  and  difficult  task. 
The  goal  of  this  project  is  self-healing,  in  which  the 
system  automatically  aligns  itself  with  the  correct 
operating  hypothesis  in  the  presence  of  unantici¬ 
pated  faults. 

A  potential  complication  that  arises  when  AVNMP 
is  viewed  in  this  manner  isthat  is,  by  definition,  only 
an  estimate  of  the  correct  operation  of  the  actual  sys¬ 
tem.  Rollbacks,  randomness  and  occur  as  a  result  of 
the  deviation  of/^from  H,  where  isthetrue  and 
complete  hypothesis  describing  the  system.  The 
question  arises  as  to  howto  distinguish  between  ran- 
domnessdueto  afaultyoperating  hypothesisand  an 
actual  fault.  Figure  104  illustrates  the  dichotomy.  In 


5.1  AVNMP  AND  FAULT  PREDICTION 

The  Active  Virtual  Network  Management  Prediction 
mechanism  (AVNMP)  [16]  requires  the  injection  of 
models,  or  hypotheses,  that  describe  the  operation 
of  the  system  assuming  no  fault  exists.  An  example 
derivation  of  a  non-fault-operating  hypothesisand 
its  relationship  to  Kolmogorov  Complexity  is  dis¬ 
cussed  in  [155].  When  Wallclock  time  reaches  a  pre¬ 
dicted  state  time,  verification  is  made  to  determine 
whether  the  predicted  value  deviates  beyond  a  pre¬ 
set  tolerance  from  the  actual  value.  If  the  prediction 
is  accuracy  fails  to  fall  within  the  tolerance,  an  out- 
of-tolerance  rollback  occurs.  Out-of-tolerance  roll¬ 
backs  in  AVNMP  are  due  to  inaccurate  prediction 
and  thus  are  related  to  error,  or  inability  of  the 
hypothesis  in  M  DL  to  fully  capture  all  patterns  in  a 
string.  The  rollback  mechanism,  which  is  also  corre¬ 
lated  to  the  length  of  E,  Figure  103,  in  the  encoded 
packet  as  discussed  in  [155],  accounts  for  random¬ 
ness.  That  is,  randomness  is  defined  asinformation 
incapable  of  being  compressed  algorithmically  and 
cannot  be  defined  algorithmically,  and  thuscannot 
be  predicted.  This  results  in  a  higher  Kolmogorov 
Complexity  as  experimentally  validated  in  [155]. 


Wallclock 


Virtual  System 


He - >H 


Anti-message  (simulation  correction) 


Anti-fault  (physical  system  correction) 

H  —  >He 


Real  System 


Figure  104.  Self-correcting  simulation  versus  faultcorrection 
within  the  actual  system. 


the  upper  portion  of  the  figure,  the  virtual  system 
runsahead  of  Wallclock  time  such  that  G lobal  Vir¬ 
tual  Time  (GVT),  that  is  the  estimate  of  time  to 
which  the  entire  AVNMP  system  has  advanced,  pro¬ 
ceeds  at  a  faster  rate  than  Wallclock.  The  physical 
system,  in  the  lower  portion  of  the  figure  proceeds 
at  the  rate  of  Wallclock.  I  n  order  to  correct  the  vir¬ 
tual  system,  anti-messages,  in  the  form  of  reversible 
code[20],  can  be  transmitted.  This  reduces  the  for- 
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ward  execution  rate  of  the  vir  tual  system  in  an 
attempt  to  bring  the  virtual  system,  based  upon  an 
estimated  hypothesis  (He),  closer  to  the  actual  sys¬ 
tem  operation  described  by  hypothesis//.  Alterna¬ 
tively,  in  the  real  system,  anti-faults  in  the  form  of 
reversible  code  can  be  generated  to  move  the  actual 
system  ()  towardsthe estimated  hypothesis (He). 

The  next  section  considers  anti-faults  in  more  detail. 

5.2  ALGORITHMIC  FAULT  DETECTION  AND 
GENERATION 

There  are  at  least  three  reasons  why  an  algorithmic 
description  of  a  fault  is  desirable.  First,  constructing 
the  smallest  algorithmic  representation  of  a  fault 
indicates  its  complexity,  which  isvaluable  informa¬ 
tion.  Complexity  isimportant  information  because  it 
is  an  indicator  of  both  the  type  of  fault  and  level  of 
difficulty  in  correcting  the  fault  and  the  severity  of 
the  fault;  fault  severity  is  important  in  triage  opera- 
tionsto  optimize  system  health.  Second,  a  more 
compact  algorithmic  representation  of  a  fault  will 
travel  faster  and  more  rapidlythrough  the  network; 
it  is  an  efficient  format  for  alerting  system  manage¬ 
ment  and  in  triggering  automated  solutions.  Third, 
it  is  relatively  easy  to  reverse  the  code  of  an  algo¬ 
rithm,  possibly  generating  an  anti-fault,  or  solution 
to  a  problem  in  certain  cases.  Reversible  code  has 
been  presented  in  previous  work  as  a  mechanism  for 
generating  anti-messages  in  Time  Warp  simulation. 
In  thissection  the  behavior  of  complexity  with 
regard  to  code  and  anti-code  is  discussed  as  well  as 
results  leading  towardsthe  use  of  reversible  code  for 
self-composing  solutions. 

The  proposed  hypothesis  is  that  the  Kolmogorov 
Complexity  of  a  combined  fault  and  solution 
description  is  minimized  when  the  optimal  solution 
to  mitigate  the  fault  is  composed.  A  nearly  trivial 
example  can  be  seen  with  reverse  code.  Assume  that 
fault  data,  F exists.  Assume  that  the  fault  does  not 
erase  any  data  but  merely  transforms  it.  Define  the 
algorithmic  description  of  the  fault  data  Ph().  The 
reverse  code  for Pp()  will  be  labeled  KPp( ) . 
Assume/3^)  and  RPp()are  minimal  length  programs. 
Then,  RPpiPpO )=4>  where  <j>  isthe  empty  set.  RF  isthe 
data  generated  by  RPj{).  Since  the  fault  does  not 
erase  any  data,  the  process  is  reversible  [30]  and 
therefore,  or  K(KF)  -  K(F)  =  0.  The  equivalence  in 
complexity //Fand  Ffollowsfrom  the  fact  that 
because  there  is  no  loss  or  gain  of  complexity  when 
the  system  is  restored  to  its  prior  state  using  the  anti¬ 
fault  process  RPp  there  is  no  work  performed.  The 


algorithmically  reversed  fault  will  be  referred  to  as 
an  anti-fault  in  this  paper. 

Consider  reversing  AVNMP  processes  in  more 
detail.  Details  of  AVNMP  operation  are  described  in 
[16]  however,  a  brief  description  is  provided  here 
using  a  Case  Diagram.  The  Case  Diagram  shown  in 
Figure  105  describes  the  AVNMP  Management 
Information  Base  (Ml  B).  Arrows  indicate  informa¬ 
tion  flow;  labeled  short  lines  indicate  counters,  and 
labeled  arrows  represent  information  flowthatis 
counted.  Active  packets  arrive  through  the  active 
channel  to  the  AVNMP  Logical  Process.  The  total 
number  of  packetsentering  the  Receive  Queue  is 
maintained  in  logicalProcessQRSize.  In  addition,  the 
Local  Virtual  Time  is  sampled  via  logicalProcessLVT. 
Next  the  State  Queue,  which  holds  past,  present, 
and  predicted  state  values,  is  updated.  Any  rollbacks 
that  may  occur  are  counted.  Next  the  Send  Queue 
transmits  any  output  messages  generated  by  the 
AVNMP  Logical  Process. The  logicalProcessAntiMes- 
sages  counter  counts  anti -messages  separately. 

Now  consider  the  effect  of  reversible  code  using 
the  SNMP  Case  diagram  previously  discussed  in 
Figure  105.  First,  it  is  important  to  distinguish 
between  the  Physical  and  Logical  Process.  The  Physi¬ 
cal  Process  isthe  model  of  the  actual  system  injected 
into  AVNMP.  This  is  in  contrast  to  the  Logical  Pro¬ 
cess,  which  isthe  entire  AVNMP  supporting  imple¬ 
mentation  that  includes  the  Physical  Process  as  well 
as  possible  state  saving,  rollback,  and  anti-message 
capabilities.  Note  that  it  isthe  Physical  Process,  that 
is,  the  object  being  modeled  that  must  be  reversed, 
not  the  AVNMP  Logical  Process  described  in 
Figure  105.  Case  Diagrams  shown  in  Figure  106  and 
Figure  107  represent  the  Physical  Process.  A  process 
operating  in  reverse  would  be  required  to  effectively 


Figure  105.  AVNMP  SNMP  Case  Diagram. 
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Figure  106.  Case  diagram  for  IP. 


5.3  DISTRIBUTED  DENIAL  OF  SERVICE 
EXAMPLE 

In  the  previous  section,  it  was  suggested  that  gen¬ 
eral-purpose  fault  correction  might  be  automated  by 
reversing  the  algorithmic  fault  description.  This  sec¬ 
tion  considers  the  effect  of  such  a  mechanism  rela¬ 
tive  to  a  particular  fault,  namelya  Distributed  Denial 
of  Service  (D DoS)  attack.  We  will  see  that  simple 
fault  reversal  may  not  always  be  the  best  solution, 
particularly  when  irrevocable  events  have  occurred, 
such  asthetheft  of  resources.  Consider  a  Distributed 
Denial  of  Service  (DDoS)  attack  as  a  fault.  Kolmog¬ 
orov  Complexity  has  been  used  to  detect  likely 
DDoS  attacksin  [154],  Note  the  assumption  that  this 
is  an  Active  Network,  thus,  the  DDoS  attack  can  con¬ 
sume  both  bandwidth  and  processing.  Referring 
back  to  Figure  104  one  can  see  thatAVNMP  continu¬ 
ously  updates  predictions  on  anticipated  load 
throughout  the  system  based  upon  legitimate  net¬ 
work  use.  In  the  situation  illustrated  in  Figure  106,  a 
simplified  snapshot  of  packet  forwarding  is  illus¬ 
trated  in  which  packetsentering  network  interfaces 
x,yar\d  z  are  forwarded  outward  through  interfaces 
a,  b,  c.  The  Case  Diagrams  in  Figure  106  and 
Figure  107  represent  a  more  detailed  model  of  infor¬ 
mation  flow.  Remember  that  these  flows  describe 
the  Physical  Process  as  mentioned  in  the  previous 
section.  As  protocol  data  units  move  through  the 
network  interfaces,  the  Simple  Network  Manage¬ 
ment  Protocol  (SNMP)  counters  shown  in 
Figure  108  will  maintain  current  state  in  separate 


Figure  107.  Statistics  maintained  for  each  interface. 

reverse  all  ar  row  directions  resulting  in  an  effective 
decrement  of  counters.  This  isthe  purpose  of  roll¬ 
back  and  anti -messages.  Bychasing  the  original  mes¬ 
sage,  the  anti -message  may  cause  additional 
rollbacks  to  occur  which,  in  a  state  saving  system, 
causes  previous  state  values  to  be  restored  to  a 
known  valid  state.  When  implemented  using  revers¬ 
ible  code,  the  anti -message  actively  undoes  the  effect 
of  the  original  message(s).  The  original  messages  to 
be  rolled  back  are  input  to  a  reverse  code  version  of 
the  physical  processin  reverse  order  of  their  Receive 
Times.  Instead  of  reversing  a  process  that  persis¬ 
tently  resides  on  a  node,  one  could  imagine  revers¬ 
ible  active  packets.  The  next  section  discusses  a 
specific  application  of  anti-faults,  pointing  out 
potential  disadvantages  with  such  a  technique. 


Figure  108.  Traffic  through  interfaces. 


MIB  table  rows  for  each  interface. 

A  hypothetical  set  of  traffic  load  graphs  is  shown 
in  Figure  109.  A  simple  algorithmic  form  of  the  load 
might  be  as  shown  in  the  figure  at  time  t,  namely 
a  =  x  +  2y  +  lOz  where  x,  y,  and  z  are  output  inter- 
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AVNMP 


Actual 


a (t)  -  a(t)  =  9x 

Figure  109.  Predicted  versus  actual  load. 


faces  and  the  numbers  indicate  load,  in  number  of 
packets,  on  those  interfaces.  The  fault  data  (F)  isthe 
difference  between  the  load  in  the  AVNMP  and 
Actual  graphs.  The  algorithmic  description  of  the 
fau  1 1  ( PF())  i  s  th  e  cod  e  th  at  f  or  ward  s  n  i  n  e  packets 
from  interface  xto  interface  a.  The  reverse  code 
(. RPf( '))  would  transmit  9  packets  from  a  back  to  x, 
essentially  reflecting  the  attack  back  towardsthe 
source  as  illustrated  in  Figure  110.  The  authors  rec- 


Figure  110.  Reflecting  the  attack  via  code  reversal. 

ognize  that  simply  reflecting  packets  would  increase 
network  load,  effectively  increasing  the  impact  of  an 
attack.  This  is  a  disadvantage  of  blindly  using  anti¬ 
faults.  In  thiscase,  it  is  disadvantageous  to  blindly 
use  anti-faults  because  the  fault  process  performed 
the  irreversible  actions  of  using  up  bandwidth  as 
well  as  processing  units.  The  best  option  in  thiscase 
might  have  been  to  simply  attempt  to  quench  attack 
packets  while  moving  closer  towardsthe  source  of 
the  attack.  Thiswould  bring  the  actual  system  closer 
to  the  expected  operation  provided  by  the  AVNMP 
model,  namely  H-*  He. 


The  goal  of  the  anti-fault  was  to  place  the  system 
back  to  a  healthy  state  that  existed  before  the  fault 
occurred.  The  naive  anti-fault  described  above, 
while  relatively  easy  to  implement,  attempted  to  do 
this  by  reversing  events,  some  of  which  were  irrevers¬ 
ible.  For  example,  once  a  resource,  such  as  band¬ 
width  or  CPU  has  been  stolen  at  a  particular 
instance  of  time,  it  cannot  be  returned.  Additionally, 
the  attempt  to  transition  the  System  State  backward 
in  time  to  a  healthy  condition  temporarily  increased 
the  impact  of  thefault.  Research  required  to  achieve 
the  effective  reversal  of  faults  in  a  more  controlled 
manner  using  complexity  is  outlined  next.  A  Swarm 
simulation  of  system  complexity  is  used  to  study  the 
relation  to  KolmogorovComplexity. 

5.4  TOWARDS  COMPLEXITY-BASED 
SOLUTION  COMPOSITION 

This  section  discusses  a  general  approach  for  self- 
composing  solutions  using  lessons  learned  from  the 
previous  section.  The  approach  can  be  described  as 
the  automated  generation  of  a  solution  hypothesis 
Hs  =  R(He  -  Hj),  i.e.,  the  reverse  of  the  algorithmic 
difference  between  the  faulty  and  correct  algorith¬ 
mic  representation  of  behavior  by  controlled  means. 
As //deviates  from,  heat  [28]  ,[29]  or  complexity  as 
presented  here,  isgenerated.  In  [28]  and  [29]  the 
relationship  between  fault  and  energy  is  explored 
and  simulated  (see[18]  and  [21]  for  recentworkon 
complexity  and  energy  and  Information  Assurance). 
It  isuseful  to  briefly  describe  [28]  in  order  to  pro¬ 
vide  a  tangible  background  and  explanation  for  the 
al  go  ri  th  m  i  c  f  au  1 1  d  etecti  onapp  roach. Themotiva- 
tion  for  that  experiment  came  from  the  relationship 
between  KolmogorovComplexity  and  entropy.  The 
definition  and  application  of  KolmogorovComplex- 
ityto  vulnerability  analysis  (discussed  in  [21])  identi¬ 
fied  how  KolmogorovComplexity  can  be  used  to 
determine  vulnerabilities  in  asystem  asareasof  low 
complexity.  An  underlying  hypothesis  of  our  work  is 
that  computation  and  communication  are  funda¬ 
mentally  related  and  conversely  bandwidth  and  pro¬ 
cessing  denial  of  service  are  fundamentally 
interrelated.  Low  complexity  data  or  code  consum¬ 
ing  large  amounts  of  bandwidth  or  processing  indi¬ 
cates  the  likelihood  of  an  attack.  A  model  of 
complexity  evolution  within  a  closed  system  is 
described  in  reference  [18],  That  reference  devel¬ 
oped  an  abstract  model  with  which  to  studycom- 
plexity,  specifically  KolmogorovComplexity,  of 
information  within  an  information  system.  That 
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Figure  111.  Selected  mathematica  complexity  functions. 

model  explores  K(x),  a  measurement  of  length  in 
bytes,  and  K(x)/s,  a  measure  of  the  maximum 
increase  in  complexity  of  the  system  due  to  code 
entering  a  system  such  as  code  carried  by  active 
packets.  The  rate  of  complexity  increase  in  terms  of 
algorithmic  active  packet  complexity  in  units  of 
within  the  closed  system  was  measured.  Significant 
changes  in  system  complexity  indicate  the  presence 
of  faults.  Reference  [154]  reported  the  results  of 
KolmogorovComplexity  probes  that  detect  Distrib¬ 
uted  Denial  of  Service  attacks. 

Complexity  estimation  mechanismsfor  these 
experiments  have  been  developed  in  Mathematica 
using  a  package  developed  specifically  for  the  study 
of  complexity,  particularly  within  active  networks. 
Thispackage  contains  several  functionsfor  the  esti¬ 
mation  of  complexity,  shown  in  Figure  110,  includ¬ 
ing  a  finite  automata  minimization  technique  and 
an  entropy-based  compression  technique.  The  pack¬ 
age  also  contains  a  framework  for  simulating  trans¬ 
mission  of  data  in  user  controlled  combinations  of 
algorithmic  and  passive  forms  within  active  packets. 
After  testing  in  Mathematica,  the  implementation 
has  been  integrated  into  an  active  network  [16]  as 
Java  code  that  can  be  easily  inserted  into  Magician 
[31]  active  packets.  The  complexity  probe  returns 
an  estimate  of  the  smallest  compressed  size  of  a 
string.  It  is  based  upon  computing  the  entropy  of 
the  weight  of  ones  in  a  string.  Specifically  it  is 
defined  in  Equation  15A  where  x#l  isthe  number  of 
1  bits  and  x#0  isthe  number  of  0  bits  in  the  string 
whose  complexity  is  to  be  determined.  Entropy  is 
defined  in  Equation  15B.  The  expected  complexity 
is  asymptotically  related  to  entropy  as  shown  in 


Equation  15C.  See  [153]  for  more  advanced  mea¬ 
sures  of  empirical  entropy  and  their  relationship  to 
KolmogorovComplexity.  The  expected  complexity 
is  asymptotically  related  to  entropy  as  shown  in 
Equation  15C. 

CK{X)^l{x)H)[-£^+X0g2{l{X)) 

A 

H(p)  =  -plog^p  -  (1.0  -pjlog^p  -  (1.0  -p) 

B 

h(x)  <=  y  p(x=x)c(X) 

l(xT=n 

c  (15) 

In  references [18]  and  [28],  a  Swarm  simulation 
containing  agents  representing  data  points  from  a 
system,  specifically,  Simple  Network  Management 
Protocol  (SNMP)  Management  Information  Base 
(MIB)  object  values  from  a  communication  network 
were  programmed  to  initiallymove  in  a  randomized 
manner.  Thus,  the  agents  began  with  high  location 
entropy  and  no  detectable  pattern  formation 
resulted  initially.  This  mechanism  was  used  to  maxi¬ 
mize  the  ignorance  of  the  prior  probability,  which  is 
also  the  purpose  of  the  universal  probability,  M() 
[27],  The  location  distribution  of  Swarm  agents,  x, 
represented  the  health  of  the  system.  The  predictive 
capability  can  be  viewed  as  the  ability,  given  x,  to  pre¬ 
dict  the  type  and  severity  of  a  fault  condition  pat¬ 
tern,  y.  The  goal  is  to  predict  M(y lx).  From  Bayes 
Theorem  [30]  the  problem  can  be  stated  as  shown 
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in.  Figure  112  shows  the  relationship  among 
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Figure  112.  Hypotheses,  complexity,  and  entropy  in  anti-fault 
generation. 

AVNMP  hypothesis  deviation,  entropy,  and  complex¬ 
ity  in  the  feedback  mechanism  designed  to  maintain 
system  health. 


M(y\x) 


M(xy) 

M(x) 


Equation  (16) 


Faults,  representing  MIB  object  values  that  oper¬ 
ated  outside  of  a  preset  threshold  generated  heat 
proportional  to  the  amount  by  which  they  exceed 
the  threshold,  \He-H\.  The  agents  were  attracted 
towards  the  location  of  heat.  An  accurate  heat  prop¬ 
agation  model  was  used  in  the  simulation  to  model 
heat  dissipation  within  a  finite  two-dimensional  grid 
upon  which  the  agents  resided.  With  the  generation 
of  heat,  the  agents  moved  in  a  consistent  direction 
towardsthe  heat,  and  then  clustered  together  in  a 
circular  pattern  around  the  heat  resulting  in  a  loss 
of  entropy.  In  a  sense,  the  introduction  of  entropy 
via  heat  energy  caused  a  reduction  in  entropyof 
agent  location.  Equation  (17)  and  Equation  (18) 
indicate  the  most  accurate  prediction  for  fault  pat¬ 
tern  location  distribution  3)  is  one  that  minimizes  the 
difference  in  length  between  the  program  that  gen¬ 
erates  x and  the  program  that  generates  xy.  Clearly, 
in  the  perfect  operation  scenario,  movement  was 
programmed  bydefaultto  be  as  randomly  generated 
as  possible.  The  program  size  required  to  define  the 
location  of  each  agentwason  theorder  ofthesize  of 


the  entire  grid.  As  heat  was  increased,  cluster  pat¬ 
terns  increased  in  number  and  size,  causing  the 
location  distribution  to  be  describable  by  smaller 
formulae,  thus  lower  complexity.  The  cluster  pat¬ 
terns  were  hypothesized  to  represented  the  type  of 
fault  while  the  complexity,  or  size  of  the  algorithmic 
description  of  the  cluster  patterns,  estimated  the 
severity  of  the  fault. 

\ogM(y\x)  =  \ogM(xy)  -  \ogM(x) 

Equation  (17) 

lim  -log  (1(7 \x)  =  Km(xy)  -  Km(x)  -  ()( 1 ) 

X  -»  00 


Equation  (18) 

The  Swarm  complexity  model  is  only  as  good  as 
its  ability  to  reflect  complexity  changes  in  an  actual 
system.  An  actual  system  contains  many  subtle  corre¬ 
lations  that  may  be  impossible  to  fully  specify  in 
detail.  However,  let  us  begin  with  a  controlled  exper¬ 
iment  involving  a  DDoS  attack.  Swarm  agent  loca¬ 
tion  over  time  represents  complexity  in  this 
experiment.  Swarm  heat  isa  control  mechanism  that 
effectively  introduces  cor  relation  and  reduces  com¬ 
plexity.  The  goal  isto  model  information  flows  in 
which  Swarm  agents  represent  blocks  of  time  aver¬ 
aged  active  packets  in  the  DDoS  differential  com¬ 
plexity  window  described  in  [154],  Heat  represents 
potential  correlation  that  indicates  potential  DDoS 
attack.  We  begin  by  calibrating  Swarm  heat  parame¬ 
ters  to  match  known  Magician  DDoS  complexity  esti¬ 
mations.  The  attacker  varies  the  correlation  within 
the  Magician  attack  stream.  The  Swarm  model  ^cor¬ 
respondingly  varied  and  the  results  compared  and 
contrasted  with  the  actual  Magician  system. 

5.5  SUMMARY 

A  Kolmogorov  Complexity  estimate  isused  to  drive 
the  optimal  generation  and  composition  of  solu¬ 
tions.  System  faults  are  represented  in  an  algorith¬ 
mic  form.  Reversible  code  isthen  developed  to 
remove  the  effect  of  faults  in  a  system.  The  applica¬ 
tion  in  this  paper  focuses  on  an  active  network  in 
which  information,  algorithmic  and  static,  can  be 
transmitted  in  a  fine-grained  manner. 
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6.1  ANALYSIS  OF  THE  EVOLUTION  OF 
COM  PLEXITY 

A  defender  would  like  to  know  not  only  the  average 
complexity  of  the  system,  but  also  the  change  in 
complexity  over  time.  The  goal  of  this  section  is  to 
attempt  to  address  the  macroscopic  behavior,  or  evo¬ 
lution,  of  complexity  in  order  to  understand  how 
complexity  relates  to  vulnerability  and  the  ever- 
increasing  cycle  of  attacker/  defender  complexity  as 
each  improves  their  capabilities.  Even  if  complexity 
can  be  measured  at  various  locations  within  an  infor¬ 
mation  system  as  discussed  later  in  this  report,  the 
system  may  evolve.  It  is  critical  to  know  if  bounds  on 
complexity  evolution  exist  so  that  complexity 
"probe"  locations  can  be  optimized.  U  nderstanding 
the  evolution  of  complexity  can  also  lead  to  optimal 
sampling  of  the  probes  so  that  they  are  not  over-sam¬ 
pled,  causing  wasted  network  resources. 

Using  Definition  6.4,  the  evolution  in  complexity 
can  be  crudely  characterized  by  placing  initial  esti¬ 
mated  values  into  Definition  6.4  and  recursively 
computing  the  complexity  of  the  output  y=p(x). 
The  output,  y,  isa  data  bit-stream  and  can  sometimes 
be  an  executable  program,  p().  The  resulting  evolu¬ 
tion  isa  series  of  the  form  fy},  where.  The  character¬ 
ization  of  this  series  depends  upon  the  initial  values 
in  Definition  6.4,  namely L(P),  K{x),  c,  and  the  mag¬ 
nitude  of  the  inequality  from  which  K{y)  isderived. 

In  order  to  provide  convenient  measurementsfor 
comparison  with  the  macroscopic  results  discussed 
later  in  this  report,  the  metrics  are  defined  in  a  man¬ 
ner  independent  of  factors  influenced  by  the  opera¬ 
tion  of  the  system  itself.  These  include  such 
parameters  as  walldock  time,  number  of  program  or 
data  bit-strings,  or  initial  location,  rate,  and  direc¬ 
tion  of  movement  of  entities  within  the  system.  The 
metrics  analytically  derived  from  Definition  6.4 
include,  £[£(31)],  E[L(  31)]  where  ^  is  the  estimated 
Kolmogorov  complexity  and  is  the  expected 
value.  The  number  of  cycles,  or  evolutions,  of  Defini¬ 
tion  6.4  is  used  to  control  the  termination  of  the 
analysis.  In  the  emulation  described  later  in  this 
report,  a  program  can  terminate  earlyor  continue 
executing  forever;  both  are  the  results  of  input  not 
being  what  is  subjectively  called  a  valid  program. 
This  analysis  assumes  that  neither  event  happens. 
Both  events  result  in  fewer  bit-strings  in  the  actual 


system  than  in  the  analysis.  The  manner  in  which 
these  events  are  handled  in  the  actual  emulation  is 
discussed  in  more  detail  later  in  the  report. 

The  analytical  results  obtained  are  shown  in 
Figure  113  using  Definition  6.4  where  the  constant 
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Figure  113.  Definition  6.4  with  Estimated  Complexities  forthe 
Data  and  Program. 

and  inequality  are  determined  from  a  compression- 
based  estimate  of  a  single  specific  program,  input, 
and  output  complexity  measurement.  The  more 
inefficient  the  program,  where  program  efficiency  is 

^4 ,  the  more  rapidly  the  complexity  can  evolve 

L(p ) 

per  program  execution.  The  specific  program  whose 
complexity  estimates  are  used  in  Figure  113  are 
from  the  program  used  in  the  emulation  described 
later  in  this  report. 

Given  the  means  to  monitor  estimates  of  com¬ 
plexity  within  an  actual,  highly  dynamic,  and  evolv¬ 
ing  information  system,  trendsthat  support  the 
theoretical  bounds  on  complexity  should  be  observ¬ 
able.  The  next  section  attempts  to  construct  such  a 
system  by  obtaining  measurements  of  bit-string 
lengths,  complexity,  and  Turing  Machine  program 
transitions  executed  in  a  highly  dynamic  environ¬ 
ment  in  which  programs,  data,  and  machines  come 
together  sharing  information  in  a  large-scale  system. 

Emulation  of  complexity  evolution 

The  effort  to  implement  a  model  of  the  evolution  of 
complexity  is  called  emulation  rather  than  simula¬ 
tion  because  the  computation  and  complexity  under 
analysis  are  part  of  the  model  itself.  The  primary 
goal  of  thisemulation  isto  examine  trends  in  the 
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evolution  of  complexity  in  a  highly  complex  inter  - 
connection  of  information  systems.  In  particular,  an 
initial  set  of  carefully  controlled  environmental 
parameters  should  include  initial  program  and  data 
complexities  and  lengthsaswell  astherateof 
exchange  and  transport  of  information.  Given  an 
initial  set  of  programs  and  data  bit-strings  let  loose 
to  execute  in  a  closed  environment,  how  does  the 
co  m  p  I  ex  i  ty  evo  I  ve?  At  wh  at  rate  d  oes  co  m  p  I  exi  ty 
increase  and  why?  What  affects  its  rate  of  increase 
and  why?  What  would  cause  it  to  decrease,  at  what 
rate,  and  why?  What  is  the  relationship  between  the 
microscopic  level,  that  is,  a  single  program  execu¬ 
tion,  and  the  macroscopic  level?  What  isthe  rela¬ 
tionship  between  the  complexity  of  a  program  and 
the  complexity  of  the  output  it  generates?  Does  the 
microscopic  change  in  complexity  hold  at  the  mac¬ 
roscopic  level  and  can  more  detail  be  deduced  about 
the  bound  on  complexity  that  it  describes?  What  is 
the  relationship  of  program  execution,  that  is,  Tur¬ 
ing  Machine  state  transitions,  to  output  complexity? 
Howdocontrolson  information  exchange  affect  the 
evolution  of  complexity?  H  ow  does  a  set  of  pro¬ 
grams,  some  that  increase  complexity  and  some  that 
decrease  complexity,  affect  the  evolution  of  the  total 
system  complexity?  This  work  attempts  to  lay  the 
groundwork  for  answering  these  questions  by  com¬ 
paring  and  contrasting  the  microscopic  and  macro¬ 
scopic  levels  of  complexity. 

The  system  used  for  experimental  study  of  com¬ 
plexity  should  mirror  a  large-scale,  highly  dynamic, 
yet  easily  con  trolled  environment  that  is  representa¬ 
tive  of  actual  information  exchange  and  evolution  in 
a  variety  of  real  world  information  systems.  The 
approach  taken  in  this  effort  to  understand  the  evo¬ 
lution  of  complexity  is  illustrated  in  Figure  114.  A 
Swarm  (http://www.swarm.org)  model  has  been 
developed  that  provides  an  open  framework  for 
experimentation  on  ComplexityTheoryand  its  rela¬ 
tionship  to  information  assurance.  The  Swarm  emu¬ 
lation  has  been  programmed  with  three  types  of 
entities:  Turing  Machines,  programs,  and  data  bit- 
strings.  Multiple  entities  of  each  type  exist  and  are 
represented  in  Figure  114.  The  program  is  repre¬ 
sented  by  a  set  of  states  connected  bytransitions,  the 
bit-string  data  by  binary  digits  within  a  rectangular 
array,  and  the  computer  chip  represents  the  Turing 
Machine.  Each  entity  is  placed  in  a  random  location 
within  a  field  of  attractive  force,  causing  data  bit- 
strings  to  be  attracted  towards  programs  and  pro¬ 
grams  towards  Turing  Machines.  The  system  can  be 


Program 


Figure  114.  Emulation  Components  and  High  Level  Dynamics. 

viewed  as  a  two-dimensional  grid  shown  in 
Figure  115  with  a  possibility  of  four  types  of  objects 


Figure  115.  CyberSwarm  Simulation  atlOOand  300 Time  Units. 

residing  on  any  grid  location,  as  shown  in  Table  9. 
Note  that  all  interactions  and  control  decisionsare 


Table  9  Entities  and  Their  Representation  in  the 
Emulation 


Entity 

Representation 

Turing  Machine 

Yellow — only  when  active 

Tape 

Green — lighter  shading  when 
more  complex 

Turing  Machine 

Green — lighter  shading  when 

Program 

more  complex 

Heat 

Red — Darker  when  hotter 

made  by  the  local  entities;  there  is  no  global  control. 
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Themotivating facto r f o r each  of th e en ti ti es i s h eat. 
Each  entity  generates  a  small  amount  of  heat  that  is 
determined  probabilistically  within  a  range  specified 
asa  startup  parameter.  Heat  can  representthe  initial 
flowof  information  that  causesone  entity  to  become 
interested  in  working  with  another  and  form  groups 
that  are  not  too  small  or  too  large.  Using  energy  to 
represent  information  isdiscussed  further  in  [24], 
The  mechanism  that  implements  this  clustering 
behavior  is  the  fact  that  entities  are  endowed  with  a 
desire  to  maintain  an  ideal  temperature  and  will 
cluster  together  via  a  randomized  movement  pattern 
in  a  direction  seeking  to  maintain  their  ideal  tem¬ 
perature.  Heat  diffuses  through  the  system  in  a  real¬ 
istic  manner.  The  brightness  indicates  the  estimated 
complexity  of  an  entity-  dark  green  indicatesa  low 
complexity,  brighter  green  indicatesa  higher  com¬ 
plexity.  In  order  to  provide  a  preview  of  howthis 
emulation  is  designed,  Figure  115  shows  the  emula¬ 
tion  at  times  100  and  300  units  respectively.  These 
figures  should  be  viewed  in  color  in  order  to  seethe 
full  interpretation.  If  aTuring  Machine,  program, 
and  data  input  tape  meet,  the  program  is  executed 
and  the  program  output  tape  generated  by  program 
execution  is  added  to  the  system  -  available  for  use 
asdata  input  in  further  computations.  Figure  115 
highlights  two  clusters:  the  circled  cluster  on  the  left 
is  dark  because  the  bit-strings  have  low  and  no  Tur¬ 
ing  M  achine  to  enable  computation  while  the  right 
cluster  is  brighter  because  it  contains  more  compu¬ 
tational  activity  and  growing  per  bit-string.  The  clus¬ 
ters  are  attracted  to  one  another  and  eventually 
form  a  single  larger,  brighter  cluster. 

There  are  two  levels  of  abstraction  occurring 
simultaneously  in  thissystem.  The  first  level  is  indi¬ 
vidual  goal-directed  behavior  and  information 
exchange;  the  second  is  computation.  In  the  first 
level  of  the  abstraction,  heat  is  a  representation  of 
motivation  for  movement  towards  common  areas. 
The  entities  are  initially  located  randomly  through¬ 
out  the  two-dimensional  space  and  gradually  cluster 
together  seeking  to  reach  an  ideal  temperate.  There 
is  complexity  in  the  location  and  movement  of  the 
entities.  Movement  towards  common  areas  allows 
information  exchange  to  occur.  At  this  level  of 
abstraction,  concepts  such  as  access  control  in  the 
exchange  of  information  can  be  explored;  however, 
that  is  outside  the  scope  of  this  report.  The  second 
level  of  abstraction  isfocused  on  computation.  That 
isthe  primary  focus  of  this  report.  The  first  level  of 


abstraction  provides  the  highly  dynamic  system  in 
which  the  second  level,  computation,  takes  place. 

The  Turing  machine 

TheTuring  Machine  entity  enables  a  program  con¬ 
sisting  of  setsof  states  and  transitionsto  operate. 
TheTuring  Machine  isoneof  the  most  fundamental 
general  computing  abstractions  and  iswell-known  in 
computer  science,  having  a  rich  theory  of  its  own 
that  this  report  intendsto  utilize  to  its  advantage. 
TheTuring  Machine  consists  of  a  seven-tuple  (Q  T, 
I,  8,  b,  q0,  qj).  Q  is  a  set  of  states,  T  is  a  set  of  tape  sym¬ 
bols,  /isa  set  of  input  symbols,  *  is  a  blank,  y0isthe 
initial  state,  y^isthe final  state.  8  isthe  next  move 
function.  8  maps  a  subset  of  Qx  7*  to  Qx  (Tx  {L,  R, 
S])k.  L,  R,  and  Vindicate  movement  of  the  tape  to 
the  left,  right,  or  stationary  respectively.  There  can 
be  multiple  tapes.  Given  a  current  state  and  tape 
symbol,  <5  specifies  the  next  state,  the  new  symbol  to 
be  written  on  the  tape,  and  the  direction  to  move 
the  tape.  The  sets  of  symbolsthat  lead  to  an  accept¬ 
ing  state  (qj)  isthe  input  language  (2).  TheTuring 
Machine  object,  when  loaded  with  a  valid  program, 
can  execute  the  program  when  provided  with  an 
input  tape.  A  program  resideson  atape;  the  only  dif¬ 
ference  between  a  program  and  a  tape  is  that  a  pro¬ 
gram  isasetof  instructionsthattheTuring  Machine 
can  interpret  asa  program.  If  the  program  fails,  that 
is,  it  is  not  a  syntactically  valid  program,  theTuring 
Machine  will  eject  it.  If  the  time  taken  by  program 
execution  on  aTuring  Machine  exceeds  a  specified 
time  limit,  the  program  isforcibly  terminated.  The 
Turing  Machine  entity  enables  the  execution  of  pro¬ 
grams  described  in  the  next  subsection.  TheTuring 
M achine entity'scomplexityisnot included  in  anyof 
the  complexity  related  measurements. 

The  program 

Valid  syntax  in  the  program  implementation  consists 
of  a  set  of  states,  Q  where  each  state,  q,  is  defined  as 
a  set  of  tuples:  {...  (value-to-write,  tape-direction, 
next-state) ...  }such  that  there  is  one  tuple  per  alpha¬ 
bet  symbol.  The  alphabet,  I,  in  thisemulation  is 
assumed  to  be  a  set  of  integers  starting  from  zero.  I 
maps  onto  q  such  that  an  input  value  read  from  the 
tape  points  to  the  correct  (value-to-write,  tape-direc¬ 
tion,  next-state)  tuple  within  state  q.  Note  that  the 
value-to-write  is  written  before  the  tape  is  moved. 
The  tape-direction  (L,  R,  and  S)  isrepresented  as(0, 
1,  and  2).  A  next-state  of  negative  one  (-1)  indicates 
a  valid  end  of  program  and  isincluded  in  length  and 
complexity  measures  of  the  program.  The  initial 
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program  has  an  estimated  complexity  of  0.7917  and 
length  of  24  bytes. 

The  input  and  output  tapes  for  the  turing 
machine 

The  implementation  of  thetape,  T,  issimply  a  string 
of  numbers.  An  example  input  tape  looks  like: 

(...  1,2,3... ).  The  tape  is  bi-directionally  infinite; 
movement  can  occur  infinitely  to  the  left  or  the 
right.  Zero  (0)  is  returned  when  blank  spaces  are 
read  from  the  tape.  The  length  and  complexity  of  a 
tape  indudesonlythe  values  which  were  initially  on 
the  tape  or  written  to  thetape  during  a  program 
execution  and  notthe  infin ity  of  zeroes  in  either 
direction.  In  order  to  prevent  explosive  growth  in 
the  amount  of  data  generated  during  an  execution 
of  the  emulation,  program  and  data  tapes  are  imple¬ 
mented  as  circular  queues.  Once  a  specified  tape 
length  is  reached,  the  oldest  data  is  over  written.  The 
initial  input  data  has  an  estimated  complexity  of  1.0 
and  the  estimated  output  data  complexity  is  0.8571. 
The  initial  input  length  is  5  bytes  and  output  hasa 
length  of  7  bytes. 

The  attractive  force 

H  eat  isthe  attractive  force  that  pul  Is  the  three  main 
object  types  together.  The  objects  generate  heat  and 
attemptto  maintain  an  ideal  temperature  by  group¬ 
ing  together.  They  can  also  be  repelled  if  the  tem¬ 
perature  isabove  the  ideal  temperature.  An  accurate 
model  of  heat  diffusion  on  the  two  dimensional  grid 
is  included  as  part  of  the  simulation.  Heat  genera¬ 
tion,  evaporation  rate,  and  the  diffusion  constant 
within  the  two-dimensional  grid  are  specified  in 
Table  10.  A  discrete  approximation  to  diffusion  is 
used  from  Definition  7.1  where  nbdavg isthe 
weighted-average  of  the  eight  adjacent  neighbors. 

H  eat  is  initially  set  uniformlyfor  each  bug  from  the 
range  specified  in  Table  10.  The  motivation  for  an 
entity  to  move  is  determined  by  its  unhappiness  as 
defined  in  Definition  7.2. 

This  report  presents  results  relating  to  the  search 
for  a  fundamental  understanding  of  information 
assurance.  The  results  from  this  research  enable  a 
deeper  understanding  that  leads  towards  quantifica¬ 
tion  of  vulnerabilities,  measurement,  control,  and 
composition  of  information  security  safeguards,  as 
well  as  new  types  of  safeguards.  The  main  result  of 
this  work  is  a  unified  view  that  enables  information 
assurance  to  be  engineered  as  an  integrated  feature 
of  an  information  system.  Results  from  this  work  are 
integrated  with  an  existi ng tool ,  called  NIPAT,  which 


Table  10  Emulation  Control  Variables 


Emulation  Variable 

Value 

Initial  length  of  data  and  program 
strings  in  bytes 

(5,24) 

Initial  spatial  distribution  of  strings 
and  programs  and  turing  Machines 

Random 

Proportion  of  strings,  programs,  and 
Turing  Machines 

(0.3, 0.3, 0.4) 

Functional  activity  of  programs 

Shuffle  Input  Data 

Heat  evaporation  rate  and  diffusion 
constant 

(0.99,1.0) 

Heat  generation  of  entities  chosen 
uniformly  from  the  range 

3,000-10,000 

Ideal  temperature  chosen  uni¬ 
formly  from  the  range 

17,000-31,000 

Emulation  run  time  in  simulation 
time  units 

400 

displays  a  measure  of  the  security  of  a  system 
through  an  electrical  engineering  paradigm  (Sec¬ 
tion  3).  Other  attempts  have  been  made  to  repre¬ 
sent  information  assurance  in  similar  grid  formats. 
N I  PAT  is  used  as  a  representation  of  generic  grid- 
based  techniques  so  that  the  strengths  and  weak¬ 
nesses  of  such  approaches  can  be  determined. 

Figure  116  illustrates  the  approach  taken  in  this 


Layers  of  Research  in  Information  Assurance 


Figure  116.  Architectural  layers  for  information  assurance. 

research  towards  building  a  foundation  for  engi¬ 
neering  information  assurance.  This  work  began  by 
developing  a  computational  model  upon  which  to 
test  and  build  information  assurance  concepts.  In 
the  next  layer  of  the  information  assurance  frame¬ 
work  a  feasible  metric  was  developed  through  inter¬ 
action  with  the  information  assurance  model.  In 
conjunction  with  the  information  assurance  model 
and  metric  space  developed  through  the  induced 
topological  framework,  insights  gained  through 
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viewing  infor  mation  asa  physical  phenomenon  were 
developed.  Electrical  engineering  and  brittle  sys¬ 
tems  analysisare  specific  examples,  tangible  to  users, 
of  these  physical  insights.  While  Figure  116  appears 
as  a  simple  construction  with  each  layer  building 
upon  the  one  beneath  it,  in  reality  the  research  has 
been  an  iterative  process  where  one  layer  is  used  to 
help  support  or  validate  another  layer.  Asa  specific 
example,  the  Turing  Machine  maybe  viewed  asa 
specific  computational  model,  KolmogorovCom- 
plexity  as  inducing  a  metric  space,  and  Brittle  Sys¬ 
tems,  derived  from  materials  science,  as  an  analysis 
of  fundamental  tradeoffs  in  the  information  assur¬ 
ance  performance  of  a  system.  Physics  of  informa¬ 
tion  has  contributed  towards  understanding  the 
conservation  of  complexity  by  providing  the  insight 
to  look  for  conserved  properties  related  to  informa¬ 
tion  assurance.  Complexity-based  vulnerability  analy¬ 
sis  (  Section  9),  sits  at  the  level  of  the  induced 
topological  framework.  Complexity-based  vulnera¬ 
bility  analysis  plots  the  complexity  measure  of  a  sys¬ 
tem.  Using  this  information,  the  conservation 
principle  from  physics  of  information,  brittle  sys¬ 
tems  analysis,  or  electrical  engineering  principles 
maybe  applied  in  order  to  engineer  the  desired 
properties  into  the  information  system.  Learning 
from  the  strengths  and  weaknesses  of  the  above 
approaches,  this  research  hasdriven  deeper  into  the 
nature  of  information  assurance  through  a  series  of 
original  hypotheses  and  theorems.  The  complexity- 
based  approach  derived  through  thisseriesof 
hypotheses  and  theorems  provides  a  general  unified 
theory  for  understanding  the  fundamentals  of  infor¬ 
mation  assurance.  The  results  from  our  complexity- 
based  approach  can  be  used  synergistically  with 
other  approaches. 

This  report  begins  with  a  brief  overview  of  rele¬ 
vant  research  that  has  attempted  to  understand  the 
properties  of  information  in  terms  of  well-defined 
scientific  and  engineering  disciplines.  Next,  the 
desirable  properties  of  a  metric  for  security  are 
examined  (Section  3.3).  In  order  to  further  the 
development  of  a  realistic  metric,  a  general  model 
for  studying  information  assurance  is  proposed  (Sec¬ 
tion  4).  Next,  a  definition  of  vulnerability  is  pro¬ 
posed  in  terms  of  the  new  model  based  on  Turing 
Machines  (Hypothesis 4.1),  and  engineered  proper¬ 
ties  of  information  assurance  with  an  analogy  to 
mechanical  engineering  are  proposed  in  terms  of 
the  new  model.  The  analogy  with  mechanical  engi¬ 
neering  is  called  Brittle  Systems  (Section  5)  and 


involves  the  design  of  information  assurance  in  a 
manner  that  accounts  for  tradeoffs  in  performance 
and  degradation  of  information  assurance  in  a  sys¬ 
tem.  Information  assurance  is  also  viewed  from  the 
perspective  of  set  theory  and  a  topological  space 
(Section  3.5).  This  is  particularly  relevant  in  under¬ 
standing  the  operation  of  the  metric  in  termsof 
secure  composition  and  the  limits  of  applying  safe¬ 
guards  to  a  system. 

Continuing  the  outline  of  the  rest  of  the  report,  a 
key  contribution  of  thiseffort,  the  development  of  a 
particular  information  metric,  is  presented.  Before 
thispoint,  the  report  has  examined  in  detail  only 
the  properties  of  a  metric,  notan  actual  metric.  The 
metric  proposed  is  Kolmogorov  Complexity 
(Section  6).  The  advantages  and  drawbacks  of  this 
metric  are  discussed,  including  its  incomputable 
nature.  However,  computable  estimates  (Section 
6.2)  of  KolmogorovComplexity  are  proposed  next, 
as  well  as  additional  useful  applications  of  Kolmog- 
orovComplexityfor  communications  in  general. 
These  additional  applicationsare  important  because 
they  demonstrate  how  information  assurance  should 
be  an  integral  part  of  information  system  design. 
NextTheorems6.1  and  6.2  concerning  the  conser¬ 
vation  of  complexity  (Section  6.7)  within  an  infor¬ 
mation  system  are  discussed.  This  leadsto  a  Swarm 
experiment  that  monitors  the  evolution  of  complex¬ 
ity  in  a  dynamic  and  complex  system  and  examines 
our  ability  to  monitor  the  complexity  as  it  evolves. 
Unless  vulnerabilities  can  be  identified  and  mea¬ 
sured,  the  information  assurance  of  a  system  can 
never  be  properly  designed  or  guaranteed.  Results 
from  a  studyon  complexity  evolving  within  an  infor¬ 
mation  system  using  Mathematica  (Section  9.2), 
Swarm,  and  a  newjava  complexity  probe  toolkit 
(Section  9.4),  developed  by  this  project,  are  pre¬ 
sented  in  this  report.  An  underlying  definition  of 
information  security  is  hypothesized  (Hypothesis 
9.1)  based  upon  the  attacker  and  defender  as  rea¬ 
soning  entities,  capable  of  learning  to  outwit  one 
another.  This  leadsto  a  study  of  the  evolution  of 
complexity  in  an  information  system  and  the  effects 
of  the  environment  upon  the  evolution  of  complex¬ 
ity.  Understanding  the  evolution  of  complexity  in  a 
system  enables  a  better  understanding  of  howto 
measure  and  quantify  the  vulnerability  of  a  system. 
Finally,  the  design  of  the  Java  complexity  probes 
toolkit  under  construction  for  automated  measure¬ 
ment  of  information  assurance  is  presented  (Section 
9.5).  Appendix  A  presents  a  dialog  in  which  typical 
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questions  about  the  relationship  between  complex¬ 
ity  and  information  assurance  are  posed  and 
answered.  This  dialog  is  best  read  after  reading  the 
introduction  KolmogorovComplexity  (Section  6.1) 
or  for  someone  alreadyfamiliar  with  complexity  the¬ 
ory  who  wants  a  quick  overview  of  the  approach 
taken  on  this  project  toward  the  relationship 
between  complexity  and  information  assurance. 
Appendix  B  presents  the  design  for  an  experiment 
that  could  be  run  to  validate  the  complexity-based 
vulnerability  analysis  concept  (Section  9).  Appendix 
C  provides  more  detail  on  the  design  and  operation 
of  the  N I  PAT  security  tool. 
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6.2  EXPERIM  ENTAL  RESULTS  FROM  THE 
EVOLUTION  OF  COMPLEXITY 

Thissection  presents  the  results  obtained  from  the 
emulation  and  their  relation  regarding  the  theoreti¬ 
cal  definition  of  complexity  discussed  earlier  in  this 
report  as  well  as  their  relation  to  vulnerability  analy¬ 
sis  and  complexity  measurement  in  general.  In  this 
emulation,  the  three  types  of  computationally 
related  entities  (data,  programs,  and  Turing 
Machines)  are  initially  distributed  in  random  loca¬ 
tions  within  the  two-dimensional  grid-space.  Each 
entity  generates  a  fixed  amount  of  heat  during  the 
emulation  run.  Each  entity  also  has  an  internal  vari¬ 
able  termed  its  "unhappiness"  which  is  a  normalized 
distance  of  the  entity  is  from  itsdesired  tempera¬ 
ture.  An  entity  cooler  than  its  ideal  temperature  will 
move  one  grid-space  per  time  unit  in  the  direction 
of  warmth.  As  entities  congregate,  heat  increases 
forming  red  hazy  hot  spots  on  the  grid. 

Begin  with  a  singleentity  that  containsaTuring 
Machine,  program,  and  data.  This  single  entity  will 
evolve  by  executing  its  program  upon  its  current 
data  at  each  time  step.  I  n  Figure  117  the  number  of 
program  transitions  is  compared  and  contrasted  to 
thedata  bit-string  length.  Note  that  all  bit-strings  are 
contained  in  circular  queues  of  size  100  bytes.  Once 
the  queue  isfilled,  the  oldest  data  isoverwritten  with 
newly  generated  data.  Thus,  there  isa  bound  on  the 
maximum  space  available.  There  are  several  reasons 
for  using  the  circular  queues.  Thefirst  isa  practical 
reason;  unlimited  growth  quicklyslowsthe  real-time 
execution  of  the  emulation.  Also,  in  reality,  one  gen¬ 
erally  does  not  feed  all  availabledata  into  a  program 
if  it  is  not  necessary  for  exactly  the  reason  men¬ 
tioned.  Finally,  the  cumulative  effects  of  new  data 
complexity  will  more  quickly  become  apparent  if  the 
older  data  is  eventually  discarded. 


Figure  117.  Single  Entity  Transitions  and  Length. 

In  Figure  118,  the  resulting  complexity  is  plotted 
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Figure  118.  Single  Entity  Change  in  Complexity. 


for  each  evolution,  that  is,  for  each  program  execu¬ 
tion.  Note  that  thiscomplexity  measure  includes  the 
program  complexity  itself.  However,  as  the  program 
does  not  change  for  the  results  presented  in  this 
report,  the  decrease  in  complexity  isdue  to  the  data. 
It  should  be  noted  that  the  program  executed  shuf¬ 
fles  the  input  data  to  generate  the  output  data. 
Clearly,  the  complexity  reaches  a  minimum  constant 
value  in  Figure  118. 

In  order  to  begin  to  study  the  relation  of  position 
with  complexity,  a  metric  called  envelopment  has  been 
defined  as  the  complement  of  the  inverse  of  the 
n  u  m  ber  of  d  i  recti  y  adj  acen t  n  ei  gh  bo  rs  of  an  en  ti  ty  as 
shown  in  Definition  8.1.  The  value  plotted  in  the  fol¬ 
lowing  graphs  isthe  expected  value  for  all  entities  in 
the  system.  H  igh  values  of  envelopment  cause  a  reduc¬ 
tion  in  the  location  complexity.  Clustering  produces 
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more  heat,  thus  raising  the  "happiness"  of  the  enti¬ 
ties  within  the  cluster  as  can  be  seen  in  Figure  119. 
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Figure  119.  Envelopment  and  Happiness. 

Envelopment  or  clustering  raises  the  possibility 
that  programs,  data,  and  Turing  Machines  will  meet. 
When  any  two  different  entities  meet,  the  contents 
of  both  are  copied  to  each  entity.  Note  that  a  com¬ 
plete  representation  of  the  system  would  have  to 
include  the  movement  and  exchange  of  data.  This  is 
purposely  not  included  in  our  results;  instead,  a  sam¬ 
ple  of  sub-component  complexities  isexamined.  It 
would  be  rare,  especially  in  today's  communication 
networksfor  example,  to  have  complete  access  to  all 
relationships.  Typically  only  the  data  points  pre¬ 
sented  by  an  SN  M  P  M I B,  for  example,  are  available. 
When  an  entity  contains  program,  data,  and  Turing 
Machine,  it  executes  the  program  generating  new 
data.  Figure  120  shows  the  number  of  data  and  pro¬ 
gram  exchanges  (labeled  Data  Cp,  Prog Cp  in 
Figure  120)  with  program  execution  enabled 
(labeled  wEx in  Figure  120)  and  without  program 
execution.  Program  execution  lags  data  and  pro¬ 
gram  exchange  by  a  small  amount,  because  program 
execution  cannot  proceed  until  all  three  entities 
meet.  Also,  when  program  execution  is  enabled 
many  more  exchanges  take  place.  This  is  because 
program  execution  creates  new  data  to  be 
exchanged.  When  execution  isenabled,  the  amount 


Figure  120.  Data  Exchanges  and  Generation  with  and  without 
Program  Execution. 

of  data  in  the  system  rises  rapidly.  All  data  previously 
gathered  by  an  entity,  bounded  by  the  size  of  the  cir¬ 
cular  queue,  isfed  into  a  Turing  Machine  when  pro¬ 
gram  execution  occurs.  Each  Turing  Machine  is 
equipped  with  a  timeout  mechanism  so  that  the  pro¬ 
gram  will  forcibly  end  in  case  of  endless  loops,  or 
simply  too  much  data.  The  number  of  forced-time¬ 
outs  is  shown  in  Figure  121.  Any  data  generated 
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Figure  121.  Program  Timeouts. 


before  the  timeout  isconsidered  valid  output.  There 
were  no  program  exceptions  in  this  case. 
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In  Figure  122,  the  state  transition  per  bit-string 


Figure  122.  Turing  Machine  Transitions  and  Bit-String  Length. 

length  shows  the  total  number  of  Turing  Machine 
State  transitions  per  length  of  bit-strings.  Initially, 
the  system  has  no  program  execution,  only  informa¬ 
tion  exchange.  It  is  possible  for  a  zero,  which  indi¬ 
cates  end  of  tape,  to  be  shuffled  into  the  middle  of 
an  output  tape.  Only  the  result  up  to  the  zero  is 
regarded  as  output;  the  remainder  of  the  program 
transitions  may  be  considered  subjectively  as  wasted 
effort.  Thus,  the  Turing  Machine  programs  can  exe¬ 
cute  state  transitions  without  generating  the  equiva¬ 
lent  amount  of  output. 

N  ow  consider  the  focus  of  this  work,  complexity. 
The  complexity  estimation  used  in  thisanalysisisthe 
co m p ressi o n -based  esti  mate  of  co m p I exi ty  descri  bed 
earlier.  In  this  emulation,  the  three  main  types  of 
entities  are  tagged  and  certain  rules  are  enforced. 
Thus,  for  example,  data  cannot  execute  a  program 
and  a  program  cannot  attempt  to  execute  a  Turing 
Machine.  Given  this  case,  the  authors  hypothesize 
that  a  given  bound  on  the  complexity  will  eventually 
be  reached  in  the  system.  I  n  Figure  123,  the  esti¬ 
mated  complexity  of  the  system  grows  faster  when 
program  execution  is  enabled  as  contrasted  with 
entity  exchange  only.  This  is  a  key  result.  In  a  closed 
environment  with  a  fixed  number  of  programs  and 
data,  the  only  way  for  the  complexity  of  an  entity  to 
change  isthrough  exchange  of  information.  No  new 
information  is  created  except  through  copying 
information  from  one  entity  to  another.  When  the 
Turing  Machines  are  enabled  in  each  entity,  they 
shuffletheinputdata,c reati n g  n ew d ata th at  i s th en 
available  to  be  exchanged  with  other  entities. 

Compare  and  contrast  the  total  bit-string  length 
with  the  total  bit-string  complexity  throughout  the 
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Figure  123.  Total  System  Estimated  Complexity. 

emulation  shown  in  Figure  124  with  program  execu- 


Figure  124.  Bit-String  Length. 

tion.  Notice  that  the  total  system  bit-string  length 
increases  ata  much  greater  rate  than  the  complexity. 
This  occurs  because  the  initial  activity  is  data  and 
program  exchange  asentitiesmeet  within  the  system 
and  data  spreads  through  sharing.  As  information 
sharing  decreases  because  each  entity  already  has  a 
copy  of  the  information,  the  primary  activity 
becomes  data  generation  through  program  execu¬ 
tion. 

I  n  Figure  125,  the  complexity  per  length  of  data 
is  plotted.  Notice  that  with  program  execution 
enabled  the  complexity  per  length  of  data  increases. 

I  n  the  case  of  operation  without  program  execution, 
only  a  finite  length  of  data  exists.  When  the  total 
estimated  complexity  of  the  data  within  each  entity 
is  plotted  over  time,  the  system  with  program  execu- 
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Figure  125.  Complexity  per  Length. 


tion  showsa  marked  increase  in  estimated  complex¬ 
ity  over  the  pure  exchange  system. 

I  n  Figure  126,  the  expected  value  of  the  Complex- 


Figure  126.  Expected  Complexity  over  Time. 

ityof  all  bit-strings  contained  by  an  entity  is  plotted 
versustime.  The  larger  curve  shows  the  average 
complexity  per  entity  when  program  execution  is 
enabled.  Asexpected,  program  execution  with  this 
particular  program,  one  that  shuffles  data,  increased 
the  average  system  complexity.  Note  that  the  differ¬ 
ence  in  complexity  does  not  appear  until  approxi¬ 
mately  time  100.  Flowever,  program  execution 
events  began  slightly  before  time  100.  A  question 
important  for  systems  that  would  make  decisions 
based  upon  complexity  concerns  the  del  ay  between 
the  time  of  the  cause  of  complexity  change  and  the 
time  for  a  measurable  increase  in  complexity  to 
occur.  Thisquestion  isan  interesting  one  that  is  out¬ 
side  the  scope  of  this  report. 


This  emulation  validated  several  intuitive  con¬ 
cepts.  The  first  involves  complexity  in  a  closed  sys¬ 
tem;  that  is,  a  system  defined  as  a  fixed  number  of 
entities  in  which  no  data,  programs,  or  Turing 
Machines  may  enter  from  outside  the  system.  A  pro¬ 
gram  injected  into  the  system  that,  on  a  microscopic 
scale  increasescomplexity generated  greater  average 
macroscopic  system  complexity.  The  results  of  this 
study  also  help  to  validate  our  initial  hypothesis:  the 
greater  the  complexity,  the  more  effort  required  to 
understand  it.  The  emulation  with  program  execu¬ 
tion,  showing  a  higher  complexity,  required  many 
more  probes  in  order  to  understand  its  behavior 
than  the  system  with  only  program  and  data 
exchange  enabled.  This  implies  that  an  attacker, 
with  no  a  priori  knowledge  of  the  system,  would  have 
a  more  difficult  time  understanding  howto  under¬ 
stand  and  thwart  the  more  complex  system.  This 
leads  to  the  next  section  on  complexity-based  vul¬ 
nerability  analysis. 

6.3  COMPLEXITY-BASED  VULNERABILITY 
ANALYSIS 

Automated  Discovery  of  Vulnerabilities 
without  a  priori  Knowledge  of  Vulnerability 
Types 

Any  vulnerability  analysis  technique  for  Information 
Assurance  must  account  for  the  innovation  of  an 
attacker.  Such  a  metric  was  suggested  about  700 
years  ago  by  William  of  Occam  [27].  Occam'sRazor 
has  been  the  basis  of  much  of  this  report  and  the 
complexity-based  vulnerability  method  to  be  pre¬ 
sented.  The  salient  point  of  Occam's  Razor  and 
complexity-based  vulnerability  analysis  isthat  the 
better  one  understands  a  phenomenon,  the  more 
concisely  the  phenomenon  can  be  described.  Thisis 
the  essence  of  the  goal  of  science:  to  develop  theo¬ 
ries  that  require  a  minimal  amount  of  random  infor¬ 
mation.  Ideally,  all  the  knowledge  required  to 
describe  a  phenomenon  should  be  algorithmically 
contained  in  formulae.  Observe  an  input  sequence 
at  the  bit-level  and  concatenate  with  an  output 
sequence  at  the  bit-level.  This  input/  output  concate¬ 
nation  isfor  either  the  entire  system  or  for  compo- 
nentsof  the  system.  If  there  is  low  complexity  in  the 
input/  output  observations,  then  it  islikely  to  be  easy 
for  an  attacker  to  understand  the  system.  H  ypothesis 
9.1  exp  I  i  ci  tl  y  states  th  e  meansof  measuring  the  com- 
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plexity  of  a  system  component,  or  protocol  interac¬ 
tion,  to  a  potential  attacker. 

M  ozart  and  vulnerability  analysis 

Science  is  art  and  art  is  science.  0  ne  of  the  most 
mathematical  of  art  forms  isthe  composition  of 
music.  Music  is  compressed  and  transported  over 
the  Internet  very  frequently,  and  most  listeners  of 
such  music  probably  have  little  interest  in  the  com¬ 
pression  ratio  of  a  particular  piece  of  music.  H  ow- 
ever,  this  piece  of  information  can  be  very 
interesting  and  informative  with  regard  to  the  com¬ 
plexity  of  a  piece  of  music.  One  would  expect  an 
incompressible  piece  of  music  to  be  highly  complex, 
perhaps  bordering  on  random  noise,  while  a  highly 
compressible  piece  of  music  would  have  a  very  sim¬ 
ple  repetitive  nature.  Most  people  would  probably 
prefer  music  that  falls  in  a  mid-level  of  complexity; 
sounds  that  are  not  repetitious  and  boring  yet  not 
random  and  annoying,  but  follow  an  internal  pat¬ 
tern  in  the  listeners'  minds.  Music  is  a  mathematical 
sequence  that  the  composer  is  posing  to  the  listener; 
the  more  easily  the  listener  can  extrapolate  the 
sequence  without  being  too  challenging  or  too  easy, 
the  more  pleasing  the  music  sounds.  Carrying  the 
music  analogy  forward  in  a  more  explicit  manner, 
consider  the  listener  as  an  attacker  and  the  com¬ 
poser  as  the  designer  of  an  information  system.  If  a 
user  has  a  preference  for  a  given  type  of  music,  a 
sample  of  that  music  can  be  included  asa  hypothesis 
in  an  MML-based  complexity  analysis.  The  lower  the 
complexity  the  more  appealing  the  music  to  that 
particular  listener.  The  more  easily  the  listener  can 
extrapolate  the  musical  sequence,  the  more  vulnera¬ 
ble  the  system. 

Imagine  the  composer  who  wishes  his  music  to  be 
enjoyed  byonly  a  specific  group  of  listeners  and  no 
others.  The  composer  is  constrained  from  generat¬ 
ing  a  completely  invulnerable  system,  that  is,  totally 
random,  because  the  composer  wants  the  music  to 
be  meaningful  to  at  least  some  potential  group  of  lis¬ 
teners.  Relating  this  analogy  to  the  hydrostatic  test 
that  was  mentioned  in  the  introduction  to  this 
report,  vulnerability  isthe  quantification  of  the 
potential  leakage  of  mu  sic  that  is  enjoyable  to  unin¬ 
tended  listeners. 

In  a  veryquick  experiment,  the  following  three 
pieces  of  music  were  tested  for  complexity: 
Beethoven's  Sonata  Op.  27,  No.  2  ("Moonlight”), 
Mozart's  Sonata  in  A  Major  ("AllaTurka”),  and 
Philip  Glass's  Opening  to  "Glassworks."  The  encod¬ 


ing  explicitly  represents  notes,  timing,  dynamics, 
and  phrasing.  Beethoven's  Moonlight  Sonata  has  a 
complexity  rating  of  0.13,  Mozart's  Sonata  has  a 
complexity  rating  of  0.16,  and  Glass's  Introduction 
to  Glassworkshasa  complexityof  0.03.  Philip  Glass 
makes  extreme  use  of  repetitious  arpeggios  in  his 
work,  thusthe  low  complexity  rating.  Thisauthor 
expected  Beethoven  to  have  a  slightly  higher  com- 
plexitythan  Mozart  byasmall  amount,  but  it  was  the 
reverse  in  thiscase.  Note  that  one  could  decompose 
the  overall  complexity  to  determine  the  complexity 
of  a  composer's  use  of  rhythm,  note  structure,  phras¬ 
ing,  or  other  musical  components.  Once  a  com- 
poser'stypical  complexity  band  isbenchmarked;this 
type  of  analysis  could  be  used  as  an  indicator  to 
determine  authenticity.  Additionally,  one  could  con¬ 
jecture  that  "learning"  to  model  Beethoven  or 
M  ozart  might  be  more  difficult  than  other  compos¬ 
ers. 

Experimental  validation  of  complexity-based 
vulnerability  analysis 

A  model  information  system  has  been  implemented 
in  Mathematica  to  begin  experimental  validation  of 
complexity-based  vulnerability  analysis.  The  goal  is 
to  determine  the  vulnerability,  not  only  of  the  over¬ 
all  system,  but  also  of  system  components.  Vulnera¬ 
bility  analysis  should  be  done  without  any  a  priori 
knowledge  about  system  operation  or  knowledge  of 
particular  types  of  vulnerabilities.  Expert  systems 
and  vulnerability  analysis  tools  that  rely  upon  rules 
identifying  particular  types  of  vulnerabilities  are 
inherently  brittle  and,  in  fact,  meaningless  against 
an  innovative  attacker.  Our  Mathematica  informa¬ 
tion  system  model  purposely  does  not  include  com¬ 
ponent  descriptions  or  explanations  because  the 
goal  is  for  the  system  to  be  a  black  box  with  respect 
to  vulnerability.  The  point  isthat  a  vulnerability  anal¬ 
ysis  can  be  done  without  having  to  know  the  details 
of  the  system.  At  the  end  of  thisanalysisthe  func¬ 
tions  of  the  analyzed  components  are  mentioned.  It 
should  then  make  intuitive  sense  that  a  particular 
component  performing  a  simple  operation  had  a 
lower  complexity  than  one  performing  a  more  "ran¬ 
dom"  operation. 

Each  component  of  an  information  system  mod¬ 
eled  in  Mathematica  contains  probe  points  through 
which  bit  level  input  and  output  can  be  collected.  A 
complexity  function  based  upon  a  simple  inverse 
compression  ratio  is  used  as  an  estimate  of  complex¬ 
ity.  The  intent  is  to  experiment  with  better  complex- 
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i  ty  m  easu  res  as  th  e  p  r oj  ect  co  n  ti  n  u  es. 

Figure  127shows  results  from  complexity  measures 
taken  of  accumulated  input  and  output  of  three  sep¬ 
arate  components  of  the  toy  information  system. 

The  graphs  show  the  complexity  of  bit-level  input 
and  output  strings  concatenated  together.  That  is, 
observe  an  input  sequence  at  the  bit-level  and  con¬ 
catenate  with  an  output  sequence  at  the  bit-level. 
This  input/  output  concatenation  isfor  either  the 
entire  system  or  for  components  of  the  system.  If 
there  is  a  low  complexity  in  the  input/  output  obser¬ 
vations,  then  it  is  likely  to  be  easy  for  an  attacker  to 
understand  the  system,  as  in  Hypothesis  9.1.  Note 
that  these  graphs  are  showing  estimates  of  Kolmog¬ 
orov  Complexity.  If  MML  [37]  were  used,  the 
attacker's  hypothesis  would  be  used  to  determine 
the  complexity  relative  to  a  particular  attacker.  I  n 
Figure  127  the  X-axis  is  the  number  input  and  out¬ 
put  observations  concatenated  to  form  a  single 
string  of  bits.  The  particular  complexity  estimate 
used  in  this  example  is  very  poor;  however,  an  ongo¬ 
ing  area  of  research  isto  improve  complexity  esti¬ 
mates.  Because  of  the  inaccurate  complexity  metric, 
all  the  figures  show  a  rising  complexity  with  the 
number  of  accumulated  observations.  H  owever, 
notice  the  rate  at  which  the  complexity  rises  in  each 
of  the  figures.  From  Table  11,  it  would  appear  that 
Component  E  is  most  vulnerable  due  to  its  low  rate 
of  increase  in  complexity  while  Component  B 
appears  to  be  the  least  vulnerable  due  to  its  steeper 
rise  in  complexity.  These  resultsmake  intuitive  sense 
because  Component  E  issimply  transmitting  data 
without  any  form  of  protection  while  Component  B 
isadding  noise  to  the  data.  Thisvulnerability 
method  does  not  take  into  whether  a  component 
reduced  or  increased  complexity;  in  other  words 
whether  the  change  was  endothermic  or  exothermic 
complexity  behavior. 


Table  11  Component  Vulnerabilities 


Component 

K(xl°P  starts  end) 

C 

19.6011 

B 

19.6302 

E 

19.0013 

These  results  show  that  vulnerabilities  can  be  sys- 
temically  discovered.  These  vulnerabilities  can  be 
quantified  to  a  value  within  the  bounds  of  the  com¬ 
plexity  measure  error.  When  used  in  an  MML  [37] 


approach  to  complexity  measurement,  apparent 
complexity,  that  is,  complexity  as  seen  by  a  particular 
attacker  can  be  determined.  Thus,  this  work  has  led 
towards  automatic  generation  of  vulnerabilities  with¬ 
out  requiring  expert  knowledge  of  each  type  of  vul¬ 
nerability  and  in  a  more  complete  manner, 
depending  upon  the  number  of  components  ana¬ 
lyzed.  Note  that  all  possible  combination  of  compo¬ 
nents  must  be  analyzed  in  this  manner. 

An  information  system  should  be  designed  in 
such  a  manner  that  the  apparent  complexity  of  the 
system  under  attack  can  be  determined  with  respect 
to  the  attacker  and  that  information  used  to  maxi¬ 
mize  the  distance  in  the  apparent  complexity 
between  the  attacker  and  defenders  in  an  automati¬ 
cally  reconstituted  system.  An  Active  Network  [16]  is 
an  ideal  environment  in  which  to  experiment  with 
an  implementation  of  automated  system  reconstitu¬ 
tion  because  it  provides  extreme  flexibility  in  fine- 
grain  code  movement  and  composition  of  code. 
Apparent  Complexity  is  used  to  reconstitute  the  sys¬ 
tem  such  that  the  complexity  difference  is  maxi¬ 
mized  between  legitimate  users  and  attackers  of  the 
system.  In  thissection,  thediscussion  islimited  to 
the  automated  hardening  of  a  system  based  upon 
information  about  an  attacker  and  a  new  form  of 
vulnerability  analysis. 

Design  of  a  complexity-based  vulnerability 
analysis  tool 

A  vulnerability  analysistool  should  quickly  and  effi¬ 
ciently  identify  and  display  the  vulnerability  of  an 
information  system  ranging  from  an  application,  to 
a  node,  network,  or  an  interconnection  of  networks. 
The  tool  should  be  portable,  easy  to  use,  and  have 
minimal  impact  upon  a  system.  The  tool  should  also 
be  integrated  with  the  management  of  the  system  or 
network.  The  approach  taken  has  been  to  use  the 
lessons  learned  from  the  emulation  in  this  analysis 
to  consider  the  design  of  a  complexity-based  vulner¬ 
ability  analysis  system  within  the  context  of  network 
management.  The  Swarm  framework  in  the  emula¬ 
tion  previously  described  queries  potentially  large 
numbers  of  entities  much  like  SNMP  polls  data  for 
system  management.  A  salient  difference  isthat  in  a 
Swarm  model,  simulation  time  can  be  controlled  in 
order  to  make  certain  that  all  data  is  queried  at  pre¬ 
cisely  the  requested  simulation  step  interval.  How¬ 
ever,  design  and  minimization  of  the  number  of  data 
points  to  be  collected,  such  that  system  operation 
can  be  described  as  fully  as  possible,  isexactlythe 
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same  in  actual  Management  Information  Base 
design  for  system  and  network  management. 

While  integration  with  system  management  using 
a  de  facto  standard  such  as  SNMP  is  a  future  goal,  a 
set  of  Java  packages  implementing  lightweight 
"probes”  that  transparently  gather  data  from  a  bit 
stream  and  report  the  result  to  the  vulnerability 
analysis  portion  of  the  Java  package  has  been  devel¬ 
oped.  This  design  was  chosen  because  the  imple¬ 
mented  probes  are  extremely  lightweight  and  also 
initiate  the  transfer  of  bit-level  data,  rather  than  wait¬ 
ing  to  respond  to  a  management  query.  Once  a  bet¬ 
ter  understanding  of  complexity  is  obtained, 
transition  to  implementation  in  SNMP  will  likely 
take  place.  The  Swarm  emulation  ran  with  200  such 
probes,  one  in  each  entity  object  and  successfully 
reported  the  results  to  the  analysis  package.  The 
authors  intend  to  pursue  the  study  of  complexity 
within  an  active  network  environment  [16],  particu¬ 
larly  with  regard  to  network  reconstitution  in  the 
face  of  attack. 

Future  work  to  be  addressed  includes  the  best  way 
to  present  data  collected  from  hundreds  of  complex¬ 
ity  probes  and  howto  incorporate  that  data  into  a 
useful  integrated  view,  particularly  in  a  network 
management  context.  Much  more  detailed  analysis 
of  the  complexity  emulation  results  is  required  as 
well  asmodificationsto  the  basic  emulation  in  order 
to  understand  the  effects  of  access  controls  in  the 
exchange  of  information,  the  role  of  complexity  in 
automated  service  composition  and  fault  tolerance, 
and  how  complexity  can  best  be  used  for  determin¬ 
ing  network  attacks  and  fault  conditions. 

Applying  complexity  to  vulnerability  analysis 

Vulnerability  analysis  is  defined  as  the  processof 
quantifying  the  vulnerability  of  an  information  sys¬ 
tem  to  attack.  An  attack  is  defined  as  the  act  of  an 
unauthorized  user  extracting  unauthorized  informa¬ 
tion  from  a  system.  As  discussed  previously,  the  infor¬ 
mation  system  appears  to  the  attacker  asa  natural 
physical  entity  to  be  researched;  its  behavior 
explored.  Complexity,  in  the  colloquial  sense, 
should  be  high  for  the  attacker  and  lowfor  legiti¬ 
mate  users.  Kolmogorov  Complexity  is  an  omni¬ 
scient  being's  measure  of  absolute  complexity;  by 
definition  it  measures  the  size  of  the  smallest  pro¬ 
gram  that  can  be  generated.  In  a  later  section  appar¬ 
ent  complexity  is  introduced  as  a  potentially  more 
feasible  measure.  H  owever,  the  use  of  complexityfor 
information  assurance  in  general  is  discussed  and 


the  term  complexity  will  refer  to  Kolmogorov  Com¬ 
plexity  in  thissection.  In  the  steps  that  follow,  a  sim¬ 
ple,  idealized  approach  is  discussed.  Many 
complicating  details  need  to  be  addressed  ifthisisto 
be  literally  applied,  however,  it  provides  a  starting 
point  for  exposition  of  the  concept. 

The  first  step  in  computing  complexity  isto  map 
the  entire  observable  portion  of  the  information  sys¬ 
tem  to  a  string.  This  string  ideally  represents  data 
points  collected  over  the  entire  system  at  every 
instant  in  time.  Note  that  a  data  system  includes  arbi¬ 
trary  input,  computation,  and  output.  This  is 
included  as  part  of  the  string.  Including  every  possi¬ 
ble  input  and  output  will  result  in  an  extremely  long 
string.  From  Definition  6.1,  complexity  isthe  length 
of  the  minimal  program  running  on  a  Universal  Tur¬ 
ing  M  achine  that  is  capable  of  generating  that 
string.  There  isonlyone  possible  length  for  the 
sh  o  rtest  p  rogram ;  th  u  s  th  e  resu  1 1  i  s  a  si  n  gl  e,  u  I ti  m  ate 
quantification  of  complexity. 

The  ability  of  the  attacker  to  compute  thecom- 
plexityof  portionsof  the  above  string  is  directly  rele¬ 
vant  to  the  attacker's  understanding  and  ability  to 
predict  future  behavior  of  the  system.  This  includes 
understanding  the  system's  vulnerabilities.  Using  the 
method  of  computing  complexity  asdescribed  in 
the  previous  paragraph  would  lead  to  an  infinite 
string;  it  is  necessary  to  map  the  infinite  into  the 
finite  in  order  to  make  the  process  feasible. 

There  are  a  few  observations  that  make  the  pro¬ 
cess  more  feasible.  The  first  isresults-based  scoping. 
For  example,  the  attacker  is  most  likely  to  be  inter¬ 
ested  in  certain  very  specific  results,  such  asobtain- 
ing  specific  types  of  information.  Thus,  the  attacker 
can  attempt  to  narrow  the  observation  points  in  the 
string  to  only  those  appear  to  be  promising.  On  the 
other  hand,  the  defender  of  the  information  system 
isprimaryconcerned  with  intrusionsand  other  fault 
behavior  that  compromises  I  nformation  Assurance. 
Thus,  the  defender  can  narrow  the  scope  to  strings 
that  contain  resu  Its  that  lead  to  those  outcomes. 

The  second  is  spatial  scoping  of  the  string.  For 
example,  the  defender  can  compute  the  complexity 
of  various  components  in  the  system,  instead  of  the 
entire  system.  For  example,  only  portions  of  the  sys¬ 
tem  relevantto  information  access  can  be  analyzed. 
The  result  is  a  mosaic  of  localized  complexity  mea¬ 
sures  of  the  system.  This  is  equivalent  to  computing 
complexity  over  various  portions  and  widths  of  the 
string.  As  shown  in  Section  6.2,  the  composition  of 
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two  components  results  in  a  complexity  that  is  no 
larger  than  the  sum  of  the  complexities. 

The  network  insecurity  path  analysis  tool  and 
complexity-based  vulnerability  analysis 

An  experimental  prototype  tool  has  been  developed 
that  combines  the  grid-based  vulnerability  analysis 
technique  with  the  complexity-based  vulnerability 
analysis  method  developed  in  this  project.  This  sec¬ 
tion  discusses  the  enhanced  tool  shown  in 
Figure  127. 
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Figure  127.  Prototype  tool  combining  the  grid-based  vulnera¬ 
bility  analysis  technique  w  ith  the  complexity-based  vulnera¬ 
bility  analysis  method. 

Every  information  system  is  assumed  to  take  data 
of  some  form  as  input,  process  the  data,  and  return 
data  as  output  in  some  form.  Essentially,  every  infor¬ 
mation  system  can  be  defined  as  a  mathematical 
operation.  Information  systems  developed  by 
humans  tod  ay  tend  to  be  highly  structured  in  order 
to  be  tractable  in  their  development  and  mainte¬ 
nance.  Generally,  there  are  well-defined  data  flows 
and  processing  functions  within  the  information  sys¬ 
tem.  The  system  is  composed  of  a  hierarchical  com¬ 
position  of  functional  units.  Thisisespeciallytrue  in 
legacy  network  systems  where  layered  design  is  ubiq¬ 
uitous.  For  these  systems,  one  can  imagine  complex¬ 
ity  probes  located  at  the  input  and  output  of  every 
functional  unit  in  the  system.  This  allows  determina¬ 
tion  of  the  vulnerability  of  each  process  and  data 
stream  at  a  very  granular  level.  This  provides  a  com¬ 
plexity-based  vulnerability  map  for  the  system.  A 
potential  attacker  would  be  unlikely  to  have  such  a 
detailed  understanding  of  a  target  information  sys¬ 
tem;  an  optimization  to  thistechnique  isto  limit 
probe  locationsto  only  those  locations  likely  to  be 
observable  to  an  attacker.  The  vulnerability  map  is 
used  to  determine  insecurity  flowthrough  the  sys¬ 


tem.  Complexity  is  viewed  as  resistance  to  attack. 
Both  a  most  likely  path  and  a  maximum  flow  algo¬ 
rithm  are  applied  in  this  experimental  complexity- 
based  vulnerability  analysis  tool .  The  most  likely 
path  isdetermined  byfinding  the  lowest  complexity 
path  from  a  given  attack  point  to  a  given  target 
point.  The  maximum  flow  algorithm  assumes  that 
lower  complexity  paths  have  a  greater  capacity.  The 
question  arises  as  to  what  "flow"  means  in  terms  of 
complexity.  Firstly,  the  entire  foundation  of  com¬ 
plexity-based  vulnerability  analysis  rests  upon  the 
likelihood,  or  probability,  of  attack  being  successful 
upon  thelowcomplexitylocationsof  an  information 
system  as  per  H  ypothesis  9.1.  The  complexity  probe 
values  are  displayed  as  links  in  the  complexity  tool 
display  shown  in  Figure  127.  The  values  of  the  links 
are  1/  K  and  these  values  are  normalized  to  1.0  for 
each  node  in  order  to  obtain  a  probability  of  success¬ 
ful  attack  upon  each  link.  The  maximum  flow  algo¬ 
rithm  provided  bythistool  shows  the  optimized 
placement  of  resources  by  an  attacker  to  maximize 
the  likelihood  of  a  successful  attack. 

The  distribution  of  insecurity  information 

The  approach  to  measuring  the  complexity  of  a  sys¬ 
tem,  asdescribed  throughout thisdocument,  results 
in  determining  the  ease  with  which  a  potential 
attacker  can  understand  the  system.  It  does  not 
directly  account  for  the  fact  that  information  about 
the  target  system  can  be  obtained  by  a  potential 
attacker  in  algorithmic  form,  that  is,  in  the  form  of 
an  attack  tool.  Such  a  tool  does  not  require  the 
attacker  to  understand  itsoperation.  The  attack  tool 
is  like  an  active  packet,  or  a  parasite  that  depends 
upon  its  host  for  transportation.  This  is  distinct  from 
a  virus,  whose  primary  function  is  replication  and 
transport.  For  example,  an  attacker  may  have  little 
understanding  of  a  particular  system,  yet  download 
an  attack  tool  that  allows  the  attacker  to  perform  a 
successful  attack.  Thus,  the  distribution  of  attack 
knowledge  needs  to  be  considered.  Once  an  attack 
tool  is  in  the  hands  of  an  attacker,  the  apparent  com¬ 
plexity  is  greatly  reduced.  There  is  an  interesting 
feedback  mechanism  here;  data  that  can  reduce  the 
apparent  complexity  to  a  potential  attacker  needsto 
be  kept  secure  bythe  defender.  0  nee  obtained  by  an 
attacker,  a  significant  drop  in  apparent  complexity 
occurs,  potentially  leading  to  further  significant 
reduction  in  apparent  complexity  as  more  vulnera¬ 
bility  information  is  obtained  and  disseminated  to 
other  attackers. 
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One  might  view  the  evolution  of  complexity  in 
the  following  terms.  An  information  system  is  built. 
Initially,  an  attacker  discovers  its  least  complex  com¬ 
ponents.  The  attacker  decides  to  automate  his  attack 
(active)  and/  or  publish  the  mechanism  to  accom¬ 
plish  the  attack  (passive).  This  information  isdis- 
seminated  through  the  population.  Meanwhile  the 
information  system  defenders,  usually  after  consid¬ 
erable  delay,  discover  the  attack  mechanism  and 
patch  the  hole.  The  population  of  attackers,  build¬ 
ing  upon  their  knowledge,  exploits  the  next  least 
complex  link  from  their  view  in  the  information. 
The  defenders  eventually  close  this  hole.  The  cycle 
continues  ad  infinitum.  The  cycle  of  attack  and 
defense  can  be  viewed  through  complexity  asa 
cycle,  or  evolution  of  complexity  as  shown  in 
Figure  128.  Low  complexity  portions  of  a  system  will 
eventually  be  learned  and  disseminated  by  an 
attacker.  To  account  for  thisdissemination  of  low 
complexity  information,  defendersreinforce  the  low 
complexity  areas  with  more  complexity.  The  results 
of  this  project  allow  system  developers  to  understand 


Time 

Figure  128.  Cycle  of  attack  and  defense  viewed  through  com¬ 
plexity  as  a  cycle,  or  evolution,  of  complexity. 


not  only  where  the  vulnerable  portions  of  the  system 
are  located,  but  to  engineer  their  systems  in  such  a 
manner  as  to  control  the  cycle  shown  in  Figure  128. 
This  process  can  be  modeled  as  low  complexity  por¬ 
tions  of  an  information  system  that  evolve  in  com¬ 
plexity  over  time. 
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A  Coherent  Self-Healing  System 

Self-Composition,  Genetic  Algorithms,  and  Kolmogorov 

Complexity 


Fault  tolerant  and  self-healing  systems  should  have 
the  ability  to  self-compose  solutionsto  faults.  This 
should  be  an  inherent  part  of  system  operation, 
rather  than  a  structure  imposed  from  "outside"  the 
system.  Genetic  Algorithms  are  on  the  path  towards 
self-composing  solutions,  however  genetic  algo¬ 
rithms,  as  implemented  today,  require  external  con¬ 
trol  to  manipulate  the  genetic  material.  In  other 
words,  the  genetic  algorithm  itself  must  be  pro¬ 
grammed  into  the  system;  if  the  genetic  algorithm 
code  failed,  then  the  self-healing  capability  would 
fail.  While  this  situation  is  not  ideal,  it  is  explored  as 
a  possible  step  towards  a  truly  self-healing  system. 
Active  networking  is  a  novel  approach  to  network 
architecture  in  which  network  nodes— switches, 
routers,  hubs,  bridges,  gateways  etc.  —  perform  cus¬ 
tomized  computation  on  packets  flowing  through 
them.  The  network  is  called  an  "active  network" 
because  new  computations  are  injected  into  the 
nodes  dynamically,  thereby  altering  the  behavior  of 
the  network.  Packets  in  an  active  network  can  carry 
fragments  of  program  code  in  addition  to  data.  Cus¬ 
tomized  computation  isembedded  within  the 
packet's  code,  which  is  executed  on  the  intermedi¬ 
ate  network  nodes. 

Many  active  network  components  and  services 
have  been  designed,  implemented,  and  are  under¬ 
going  experimentation.  The  ABone  (Active  Network 
Backbone)  implementsa  relatively  large-scale  (given 
the  novelty  of  the  technology)  active  network 
( 0(100)  nodes) .  H  owever,  the  fundamental  science 
required  to  understand  and  take  full  advantage  of 
active  networking  is  lagging  behind  the  ability  to 
engineer  and  build  such  networks.  In  fact,  the  cur¬ 
rent  Internet,  whose  protocols  were  built  upon  the 
ill-defined  goal  of  simplicity  are  only  slowly  being 
understood.  An  outcry  from  the  I  nternet  commu¬ 
nity,  with  its  carefully  crafted,  static  protocol  process¬ 
ing,  with  massive  documentation  (0(4000)  Request 
for  Comments)  of  passive  (non-executable)  packets 
isthat  it  is  already  "too"  complex. 


An  adaptive  fault  tolerant  system,  no  matter  how 
resilient,  would  unlikely  receive  acceptance  by  indus¬ 
try  or  the  community  if  it  were  considered  "com¬ 
plex"  in  the  colloquial  sense.  H  ow  can  such  systems, 
which  require  complexity  to  be  adaptive,  at  the  same 
time  appear  simple  to  understand  and  manage.  Are 
active  networks  really  more  complex  than  the  cur¬ 
rent  Internet?  Are  adaptive  applications  built  upon 
active  networks  any  more  or  less  complex  than  the 
same  applications  built  upon  the  legacy  I  nternet? 
Doesa  measure  of  complexity  exist  that  would  allow 
an  objective  comparison  to  be  made?  What  are  the 
benefits  of  an  active  network  with  respect  to  passive 
networks?  While  these  are  extremely  difficult  ques- 
tionsto  answer,  this  report  attempts  to  lay  the 
groundwork  for  answering  these  questions  by  pro¬ 
posing  a  complexity  measure,  Kolmogorov  Com¬ 
plexity,  and  proposing  an  adaptation  mechanism, 
Genetic  Programming,  based  upon  an  analogy  with 
biological  systems. 

KolmogorovComplexity  was  applied  asa  measure 
of  potential  algorithmic  information  content  for  use 
in  prediction  and  control  of  an  active  network  [185]. 
In  the  remainder  of  this  paper,  the  term  complexity 
will  be  used  to  indicate  a  particular  form  of  com¬ 
plexity  known  as  Kolmogorov  Complexity.  Kolmog¬ 
orov  Complexity  is  a  measure  of  the  length  of  the 
smallest  program,  such  that,  when  executed  upon  a 
Universal  Turing  Machine,  it  generates  a  particular 
string  of  bits*.  The  length  of  such  a  smallest  pro¬ 
gram  K(x)  isthe  complexity  of  the  bit-string,  *.  It 
should  be  noted  that  research  has  been  performed 
in  the  use  of  genetic  programming  to  evolve  the 
smallest  program  for  a  given  bit-string,  and  thus  esti¬ 
mate^?*).  Complexity  was  applied  to  optimize  the 
combined  use  of  communication  and  computation 
within  an  active  network;  to  determine  the  optimal 
amount  of  code  versus  data  [185].  It  was  shown  that 
if  the  KolmogorovComplexity  of  the  information 
related  to  the  prediction  of  the  future  state  of  the 
network  isestimated  to  be  high,  then  the  ability  to 
develop  code,  representing  the  non-random,  or 
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algorithmic  portion,  of  that  information  is  low.  This 
results  in  a  low  potential  benefit  for  algorithmic  cod¬ 
ing  of  the  information;  the  benefit  of  having  code 
within  an  active  packet  would  appear  to  be  minimal 
in  such  cases.  Conversely,  if  the  complexity  estimate 
is  low,  then  there  is  great  potential  benefit  in  repre¬ 
senting  information  in  algorithmic  form  within  an 
active  packet.  It  was  suggested  that  if  the  algorithmic 
portion  of  information  changes  often  and  impacts 
the  operation  of  network  devices  then  active  net¬ 
working  provides  the  best  framework  for  implement¬ 
ing  solutions!  185] .  This  is  precisely  the  case  in 
genetically  programmed  network  services,  a  new 
class  of  services  that  are  not  pre-defined  but  those 
that  evolve  them  selves  in  the  network  in  response  to 
the  state  of  the  network.  In  this  report,  we  will 
restrict  this  class  to  those  services  that  are  program¬ 
matic  solutions  for  perceived  faults  that  occur  in  a 
network.  Further  research  is  required  to  generalize 
thisclassto  include  other  types  of  network  services. 

Frameworks  for  protocol  and  service  composition 
have  been  developed  for  active  networks,  one  of 
which  is  well  described  [186],  Thoughts  on  the 
requirements  for  protocol  and  service  composition 
are  also  discussed  in  [187],  H  owever,  the  work  done 
to  date  is  lacking  in  that  it  does  not  address  how 
active  code  will  be  generated  rapidly  enough  to 
make  dynamic  injection  of  the  code  a  significant  fac¬ 
tor.  The  argument  against  active  and  programmable 
networks  is  that,  given  enough  time,  memory,  and 
processing  power,  legacy  systems  could  eventually 
contain  all  the  functionality  that  active  networks 
could  have  injected.  To  do  this,  legacy  developers 
would  have  to  know-ftem  a  priori }all  possible  func¬ 
tionality  that  would  be  required  in  the  network. 
However,  this  report  demonstrates  that  it  is  possible 
for  the  network  to  generate  code  rapidly  and  in  a 
manner  that  can  never  be  known  Aem  a  priori }for 
every  possible  condition.  The  inspiration  for  a 
genetic  algorithm  based  approach  to  solution  com¬ 
position  comes  from  nature  in  the  form  of  the  dock¬ 
ing  problem  in  molecular  biology  [  188] .  Solutions 
that  efficiently  match  a  particular  fault  should  be 
able  to  "dock"  with  the  fault.  Prediction  for  success¬ 
ful  docking  in  biology  can  be  attempted  by  search¬ 
ing  for  minimal  energy  or  minimal  geometric 
construction  combinations.  Here  we  consider  a 
genetic  algorithm  used  to  generate  a  solution  for  the 
self-composition  of  solutionsto  mitigate  network 
faults.  One  goal  of  the  experiment  discussed  later  in 
this  report  is  to  study  the  relationship  between  com¬ 


plexity  and  solution  composition.  In  particular,  it 
has  been  hypothesized  that  the  complexity  of  the 
fault  and  potential  solution  will  decrease  as  the  opti¬ 
mal  solution  is  composed.  Specific  examples  of 
faults  that  could  be  simulated  are: 

•  Network  mis-con figuration 

•  Bandwidth  and  Processor  mis-al location 

•  Faultscaused  by  Distributed  Denial  of  Service 
and  virus  attacks 

•  Poor  Traffic  shaping 

•  Routing  problems 

•  Non-optimal  fused  data  within  the  network 

•  Poor  link  quality  in  wirelessand  mobile 
environments 

•  M  al-composed  protocol  framework  models  in 
the  network 

•  Poorly  tuned  components  of  network  services 
A  simple  fault,  namely,  mis-allocation  of  band¬ 
width  and  processing  capability  resulting  in  packet 
jitter,  has  been  chosen  as  a  working  example.  A  fit¬ 
ness  function  defines  a  metric  for  "goodness”  of  a 
population.  In  thiscase,  "goodness”  isthe  reduction 
in  the  variance  of  packet  arrival  times.  The  fault  is 
represented  by  the  difference  between  the  actual  sys¬ 
tem  and  a  minimum  required  fitness.  Genetic  mate¬ 
rial  will  evolve  to  minimize  the  effect  of  the  fault. 
The  complexity  of  the  combined  fault-solution  pair 
should  be  at  a  minimum  when  the  fitness  is  optimal. 
We  will  borrow  a  term  from  molecular  biology  and 
call  a  perfectly  matched  fault  and  solution  a  success¬ 
ful  "docking". 

COM  PLEXITY  AND  EVOLUTIONARY  CONTROL 

Complexity  and  evolution  are  intimately  linked.  Kol¬ 
mogorov  Complexity  (K(x))  isthe  optimal  compres¬ 
sion  of  string  x.  This  incomputable,  yet  fundamental 
property  of  information  has  vast  implications  in  a 
wide  range  of  applications  including  system  manage¬ 
ment  and  optimization[192]  [193],  security 
[194], [195],  and  Bioinformatics.  Active  networks 
[196]  form  an  ideal  environment  in  which  to  study 
theeffectsof  trade-offsin  algorithmic  and  static 
information  representation  because  an  active  packet 
is  concerned  with  the  efficient  transport  of  both 
code  and  data.  As  noted  inFigure  129,  there  is  a 
striking  similarity  between  an  active  packet  and 
DNA.  Both  carry  information  having  algorithmic 
and  non-algorithmic  portions.  The  algorithmic  por¬ 
tion  of  DNA  has  transcription  control  elements  as 
well  as  the  codons  [197],  The  active  packet  has  con¬ 
trol  code  and  may  contain  data  as  well. 
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KolmogorovComplexityand  Genetic  Program¬ 
ming  have  complementary  roles.  Genetic  Program¬ 
ming  has  been  used  to  estimate  Kolmogorov 
Complexity  [198],  [199],  Genetic  Programming 
benefits  from  Kolmogorov  Complexity  as  a  measure 
and  meansof  controlling  not  only  the  complexity, 
but  the  size  and  generality  of  the  result  [200],  One 
of  the  most  obvious  uses  for  complexity  in  network¬ 
ing  is  Programmatic  Compression  [201],  In  this 
report,  the  foundation  is  developed  for  the  use  of 
complexity  to  enable  the  network  to  self-heal.  In  the 
next  section,  a  description  of  the  Minimum  Descrip¬ 
tion  Length  algorithm  and  its  role  in  Active  Net¬ 
works  is  explained. 

THE  APPLICATION  OF  COM  PLEXITY  IN  A 
COMMUNICATIONS  NETWORK 

The  goal  of  the  system  that  has  been  implemented  is 
to  utilize  the  benefit  of  an  active  network  to  auto¬ 
matically  generate  solutionsthat  bring  the  network 
back  into  line  with  a  healthy  model  of  the  system. 
The  fitness  function  is  used  to  describe  the  desired 
outcome.  The  concept  of  molecular  docking,  men¬ 
tioned  previously,  requires  a  more  precise  measure¬ 
ment  of  the  degree  of  "fit"  in  the  docking  of  a  fault 
and  solution.  In  this  project,  we  are  exploring  the 
use  of  Kolmogorov  Complexity,  estimated  via  the 
Minimum  Description  Length  algorithm,  as  the 
means  to  measure  the  fit  between  the  fault  and  the 
desired  state.  The  next  paragraph  describes  the  M  in- 
imum  Description  Length  complexity  estimator  and 
its  relationship  to  active  networking. 

•  Nucleotide  Bases:  A,  C,  G,  U 

•  Triplets  (Codons)  result  in  translation  to  Amino  Acids  within  the  Ribosome 


•  Chromosome  Structure:  List  of  connected  unit  pairs.  In  eucaroytes  these  reside 
in  the  nucleus:  ((Delay,  Delay)(Join,  Split)...) 
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•  DNA  Strand  and  Active  Packet  operate  in  the  same  manner:  both  carry  control 
and  data 

•  Dualism  in  genetic  world  (gene  as  information  and  algorithmic  code)  first  noted 
by  von  Neumann 

Figure  129.  DNA  and  an  Active  Packet. 

A  question  active  network  application  developers 
must  answer  is:  "How  can  I  best  leverage  the  capabil¬ 
ities  that  active  networks  have  to  offer?"  Because  the 


word  "active"  in  active  networks  refers  to  the  ability 
to  dynamically  move  code  and  modify  execution  of 
components  deep  within  the  network,  thistypically 
leads  to  another  question:  "What  isthe  optimal  pro¬ 
portion  of  content  for  an  active  application  that 
should  be  code  versus  data?”  A  method  for  obtain¬ 
ing  the  answer  to  this  question  comes  from  direct 
application  of  Minimum  Description  Length  (MDL) 
[202]  to  an  active  packet.  LetDxbe  a  binary  string 
representing  x.  Let  Hx  be  a  hypothesis  or  model,  in 
algorithmic  form,  that  attempts  to  explain  how  x  is 
formed.  Later  in  this  report,  we  view  Hx  as  a  predic¬ 
tor  of  xin  the  analysis  of  Active  Virtual  Network 
M  anagement  Prediction.  For  now  let  us  focus  on 
developing  a  measure  of  the  complexity  of  x.  MDL 
states  that  the  sum  of  the  length  of  the  shortest 
encoding  of  a  hypothesis  of  two  components  will 
estimate  the  KolmogorovComplexity.  The  two  com¬ 
ponents  are  the  length  of  a  model  generating  string 
xand  the  length  of  the  shortest  encoding  of  x  using 
the  hypothesis.  This  can  be  represented  mathemati- 
callyas  K(x)=  K(HX)  +  k{dx\hx)  .  Note  that  error  in 
the  hypothesis  or  model  must  be  compensated 
within  the  encoding.  A  small  hypothesis  with  a  large 
amount  of  error  does  not  yield  the  smallest  encod¬ 
ing,  nor  does  an  excessively  large  hypothesis  with  lit¬ 
tle  or  with  no  error.  A  method  for  determining  K(x) 
can  be  viewed  as  separating  randomness  from  non¬ 
randomness  in  x  by  "squeezing  out"  non-random- 
ness,  which  is  computable,  and  representing  the 
non-randomness  algorithmically.  The  random  part 
of  the  string,  that  is,  the  part  remaining  after  all  pat¬ 
tern  has  been  removed,  represents  pure  random¬ 
ness,  unpredictability,  or  simply,  error.  Thus,  the 
goal  isto  minimize  i(He)  +  i(Dx\He)  +  1(E)  where 
/(x)  isthe  length  of  string  x,  He  isthe  estimated 
hypothesis  used  to  encode  the  string  (Dx)  and  £  is 
the  error  in  the  hypothesis.  The  more  accurately  the 
hypothesis  describes  string  xand  the  shorter  the 
hypothesis,  the  shorter  the  encoding  of  the  string. 
Choosing  an  optimal  proportion  of  code  and  data 
minimizes  the  packet  length. 

The  proposed  hypothesis  is  that  the  Kolmogorov 
Complexity  of  a  combined  fault  and  solution 
description  is  minimized  when  the  optimal  solution 
to  mitigate  the  fault  is  composed.  A  nearly  trivial 
example  can  be  seen  with  reverse  code.  Assume  that 
fault  data,  £  exists.  Assume  that  the  fault  does  not 
erase  data  but  merely  transforms  it.  Define  the  algo¬ 
rithmic  description  of  the  fault  data  PF() .  The 
reverse  code  forPF()  will  be  labeled  RPf()  .  Assume 
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PF0  and  rpf()  are  minimal  length  programs.  Then, 
rpf(pf ())  =  <(),  where  4>  is  the  empty  set.  rf  is  the 
data  generated  by  RPf()  .  Since  the  fault  does  not 
erase  any  data,  the  process  is  reversible  [  191]  and 
therefore,  k(rf)~k(F)  =  o .  The  equivalence  in 
complexity  of  rf  and  Ffollows  because  there  is  no 
loss  or  gain  of  complexity  when  the  system  is 
restored  to  a  prior  state  using  the  anti-fault  process 
rpf ;  there  is  no  work  performed.  The  algorithmi¬ 
cally  reversed  fault  will  be  referred  to  as  an  anti-fault 
in  this  report. 

The  descriptive  complexity  of  the  fault  and  the 
solution  should  ultimately  be  as  low  as  possible  and 
the  M  inimum  Descriptive  Length  algorithm  can  be 
used,  among  other  complexity  estimators,  as  a  tech¬ 
nique  to  guide  solution  composition.  In  fact,  this  is 
the  case  with  reversible  code.  Complexity  is  impor¬ 
tant  information  because  it  is  an  indicator  of  both 
the  type  of  fault  and  level  of  difficulty  in  correcting 
the  fault  and  the  severity  of  the  fault;  fault  severity  is 
important  in  triage  operationsto  optimize  system 
health.  Second,  a  more  compact  algorithmic  repre¬ 
sentation  of  a  fault  will  travel  faster  and  more  rapidly 
through  the  network;  it  is  an  efficient  format  for 
alerting  system  management  and  in  triggering  auto¬ 
mated  solutions.  Third,  it  can  be  relatively  easy  to 
reverse  the  code  of  an  algorithm,  possibly  generat¬ 
ing  an  anti-fault,  or  solution  to  a  problem  in  certain 
cases.  Reversible  code  has  been  presented  in  previ- 
ouswork  asa  mechanism  for  generating  anti-mes¬ 
sages  in  Time  Warp  simulation  [203], 

Fault  tolerant  and  self-healing  systems  should 
have  the  ability  to  self-compose  solutions  to  faults. 
Ideally,  composition  should  be  an  inherent  part  of 
system  operation,  rather  than  a  structure  imposed 
from  "outside"  the  system.  Genetic  Algorithms  are 
on  the  path  towards  self-composing  solutions,  how¬ 
ever  genetic  algorithms,  as  implemented  today, 
require  external  control  to  manipulate  the  genetic 
material.  In  other  words,  the  genetic  algorithm  itself 
must  be  programmed  into  the  system;  if  the  genetic 
algorithm  code  failed,  then  the  self-healing  capabil¬ 
ity  would  fail.  While  this  situation  is  not  ideal,  it  is 
explored  as  a  possible  step  towards  a  truly  self-heal¬ 
ing  system. 

One  of  the  contributions  of  this  report  is  the 
study  of  complexity  in  genetic  algorithms  with  the 
goal  of  eventually  designing  self-composing  solu¬ 


tions.  Genetic  algorithms  are  widely  known  for  their 
ability  to  find  optimal  solutions,  avoiding  local 
extremes,  by  using  evolutionary-like  processes 
dependent  upon  "random"  mutation.  Kolmogorov 
Complexity  describes  the  randomness  of  informa¬ 
tion.  The  KolmogorovComplexityof  thegenetic 
material  during  the  evolution  of  a  genetic  algorithm 
can  be  estimated  and  yields  interesting  clues  about 
the  underlying  physicsof  the  information  during  its 
evolution  towards  a  fitness  function.  It  isour  hypoth¬ 
esis  that,  as  the  evolution  proceeds  and  the  fitness 
level  of  thegenetic  material  rises,  the  complexity 
decreases.  This  result  yields  an  interesting  insight 
th  at  su  p  p  o  rts  th  e  h  yp  oth  esi  s  th  at  "so  I  u  ti  o  n  s"  th  at 
self-compose  to  mitigate  a  fault  will  tend  to  decrease 
in  complexity. 

THEGENETIC  ALGORITHM 

The  goal  of  this  study  is  to  examine  how  complexity, 
specifically  an  estimate  of  KolmogorovComplexity, 
relates  to  the  evolution  of  a  self-composing  solution. 
We  consider  a  genetic  algorithm  to  be  an  approxi¬ 
mation  of  a  self-composing  system.  Details  on  the 
operation  of  genetic  algorithms  can  be  found  in 
[204]  [205]  [206].  This  paper  assumes  a  basic 
understanding  of  genetic  algorithm  operation  and 
provides  only  a  brief  overview.  In  this  experiment  a 
pre-existing  Mathematica  genetic  algorithm  pack¬ 
age3  is  used.  The  decision  to  use  Mathematica  was 
based  upon  its  combination  of  symbolic  and  arith¬ 
metic  capabilities  and  because  many  of  our  research 
utilities,  including  Kolmogorov  estimation  func¬ 
tions  are  implemented  in  Mathematica. 

The  genetic  algorithm  package  assumes  a  popula¬ 
tion  of  binarystringsof  preset  size  and  whose  values, 
when  converted  to  a  float  type,  are  between  zero  and 
one.  Similarly,  the  fitness  function  isassumed  to 
accept  and  return  values  in  the  range  from  zero  to 
one.  Fitness  values  closer  to  one  are  assumed  indi¬ 
cate  more  highly  optimized  results.  A  genetic  algo¬ 
rithm  consists  essentially  of  three  parts:  selection, 
crossover,  and  mutation.  In  selection,  each  string  is 
selected  with  a  probability  proportional  to  its  fitness 
value.  In  crossover,  a  pair  of  selected  strings  is  deter¬ 
mined,  a  position  along  the  string  is  chosen  at  ran¬ 
dom,  and  the  right  and  left  parts  of  each  string  are 
swapped.  In  mutation,  each  gene  is  changed  at  ran¬ 
dom  with  a  low  probability,  in  this  case  a  probability 


3.  Written  by  Mats  G.  Bengtsson  National  Defense  Research  Establishment  Box  1165,  S-581  11  Linkoping  Sweden  email:  mat- 
ben  @1  in. foa.se 
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of  0.002  was  chosen  based  upon  repeated  experi¬ 
mentation.  Each  individual  iscoded  asa  binary 
string  of  length  10  bits.  This  length  provides  the  size 
necessaryto  achieve  numerical  precision  while 
being  small  enough  to  allow  a  large  population  size 
and  without  excessive  overhead.  The  problem  is  lim¬ 
ited  to  one-dimension  with  value  x,  which  represents 
the  real  value  of  the  bits  in  string  x,  that  varies  from 
zero  to  one.  The  first  step  is  to  create  a  random  pop¬ 
ulation.  The  population  is  defined  on  the  real  axis 
from  zero  to  one.  The  random  values  are  repre¬ 
sented  in  the  form  of  binary  strings.  Next  a  fitness 
function  isdefined.  It  isdefined  in  the  interval  zero 
to  one.  The  fitness  function  in  thisexample  is 
defined  as  fx  =  sin(mx) .  Thus,  binary  represen ta- 
tionsof  values  that  are  odd  multiples  of  0.5  will  have 
maximal  fitness. 

Kolmogorov  Complexity 

Thissection  discusses  a  general  approach  for  self- 
composing  solutions  using  lessons  learned  from  the 
previous  section.  The  approach  can  be  described  as 
the  automated  generation  of  a  solution  hypothesis 
H=  R(He-Hf),  th  at  i  s,  th  e  reverse  of  th  e  al  go  r i  th  m  i  c 
difference  between  the  faulty  and  correct  algorith¬ 
mic  representation  of  behavior  by  controlled  means. 
As /^deviates from  He,  complexity  or  heat  as  pre¬ 
sented  here,  is  generated.  In  [192]  the  relationship 
between  fault  and  energy  is  explored  and  simulated 
( see  [  195] ,  [  194] ,  [  207]  for  recent  work  on  com¬ 
plexity  and  energy  and  I  nformation  Assurance)  .The 
motivation  for  that  experiment  came  from  the  rela¬ 
tionship  between  KolmogorovComplexity and 
entropy.  The  definition  and  application  of  Kolmog- 
orovComplexityto  vulnerability  analysis  identified 
how  KolmogorovComplexity  can  be  used  to  deter¬ 
mine  vulnerabilities  in  asystem  as  areas  of  low  com¬ 
plexity.  An  underlying  hypothesis  of  our  work  isthat 
computation  and  communication  are  fundamentally 
related  through  complexity  theory,  and,  thus,  band¬ 
width  and  processing  utilized  in  denial  of  service  are 
fundamentally  interrelated.  Low  complexity  data  or 
code  consuming  large  amountsof  bandwidth  or  pro¬ 
cessing  indicates  the  likelihood  of  an  attack.  A 
model  of  complexity  evolution  within  a  closed  sys¬ 
tem  isdescribed  in  reference  [194],  That  reference 
developed  an  abstract  model  with  which  to  study 
complexity,  specificallyKolmogorovComplexity,  of 
information  within  an  information  system.  That 
model  explores  K(x) ,  a  measurement  of  length  in 
bytes,  and  K(x)/s ,  a  measure  of  the  maximum 


increase  in  complexity  of  the  system  due  to  code 
entering  a  system  such  as  code  carried  by  active 
packets.  The  rate  of  complexity  increase  in  terms  of 
algorithmic  active  packet  complexity  in  units  of 
K(x)/s  within  the  closed  system  was  measured.  Sig¬ 
nificant  changes  in  system  complexity  indicate  the 
presence  of  faults.  Reference  [208]  reported  the 
results  of  KolmogorovComplexity  probes  that 
detect  Distributed  Denial  of  Service  attacks. 

An  active  network  environment  is  used  to  empha¬ 
size  that  information  assurance  laws  must  be  able  to 
deal  with  many  alternative  and  dynamically  chang¬ 
ing  representations  of  information.  With  regard  to 
active  packets  and  information  theory,  passive  data  is 
simple  Shannon  compressed  data,  and  active  pack¬ 
ets  are  a  combination  of  data  and  program  code 
whose  efficiency  can  be  estimated  by  meansof  Kol¬ 
mogorov  Complexity  [  209] .  The  active  network  Kol¬ 
mogorov  Complexity  estimator  is  currently 
implemented  with  a  variety  of  compression  estima¬ 
tors  ranging  from  simple  empirical  entropy  to  more 
complex  algorithms  beyond  the  scope  of  this  confer¬ 
ence.  The  probe  returns  an  estimate  of  the  smallest 
compressed  size  of  a  string.  The  simplest  estimator, 
trading  accuracy  for  speed  and  low  overhead,  is 
based  upon  computing  the  entropy  of  the  weight  of 
ones  in  a  string.  Specifically  it  isdefined  in 
Equation  19where*#i  isthe  number  of  1  bits  and 
x#o  isthe  number  of  0  bits  in  the  string  whose  com- 
plexity  is  to  be  determined.  Entropy  isdefined  in 
Equation  20.  See  [209]  for  other  measures  of  empir¬ 
ical  entropy  and  their  relationship  to  Kolmogorov 
Complexity.  The  expected  complexity  is  asymptoti¬ 
cally  related  to  entropy  as  shown  in  Equation  21. 
Observe  an  input  sequence  at  the  bit-level  and  con¬ 
catenate  with  an  output  sequence  at  the  bit-level. 
This  input/  output  concatenation  is  observed  for 
either  the  entire  system  or  for  components  of  the 
system.  Low  complexity  input/  output  observations 
quantify  the  ease  of  understanding  by  a  potential 
attacker.  Previous  work  has  demonstrated  the  use  of 
KolmogorovComplexityfor  Distributed  Denial  of 
Service  (DDoS)  attack  detection  [208], 

( 19)  (k(x)  -  l(x)H){^f^)  +  iog2(/«) 

(20)  H(p)  =  -p\og2p-(\.0-p)\og2p-{\.0-p) 

(21)  H(X)~  y  P(X=x)K(x) 

l(x)  =  n 
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Because  Kolmogorov  Complexity  was  originally 
derived  for  the  study  of  randomness,  it  is  interesting 
to  note  that  randomness  plays  a  significant  role  in 
the  operation  of  the  genetic  algorithm  itself.  The 
initial  genetic  material  should  be  generated  ran¬ 
domly.  Selection  of  genes  for  mutation  and  cross¬ 
over  points  should  also  be  done  randomly.  Finally, 
selection  of  gene  pairs  is  done  randomly,  but  in  pro¬ 
portion  to  their  fitness  value. 

Given  the  randomly  generated  nature  of  the  ini¬ 
tial  genetic  material,  one  would  expect  the  complex¬ 
ity  of  the  genetic  material  to  decrease  as  the  genetic 
algorithm  evolves.  Thisisclearlythe  case  in  the  ini¬ 
tial  steep  downward  spike  shown  in  Figure  130.  As 


Figure  130.  Complexity  of  Genetic  versus  Evolutionary  Time 
Steps  with  Population  128. 

the  algorithm  continues  to  evolve  and  the  fitness  of 
the  genetic  material  improves,  one  would  expect 
structure  and  order  to  appear.  As  mentioned  earlier, 
in  this  specific  case,  the  algorithm  encourages  the 
growth  of  binary  strings  that  represent  odd  multi¬ 
ples  of  0.5. 

Figure  131  shows  the  complexity,  estimated  as  the 
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Figure  131.  Cumulative  Fitness  Function  of  Genetic  Material 
with  Population  128. 

compressed  size,  of  the  genetic  material  as  a  func¬ 
tion  of  evolutionary  steps.  Compare  with  Figure  132, 


which  shows  the  sum  of  the  fitness  values  as  a  func¬ 
tion  of  evolutionary  steps.  The  complexity  decreases 
as  the  cumulative  fitness  function  increases,  then 
rises  again  while  evolution  continues  however,  the 
fitness  function  does  not  significantly  increase.  The 
co  m  p  I  ex i  ty  measu  re  seem s  to  i  n  d  i  cate  th  at  th  e  fi  rst 
optimal  genetic  composition  was  found  near  evolu¬ 
tion  step  50.  As  the  genetic  algorithm  continued 
beyond  that  point,  the  genetic  material  became 
more  complex  again  with  no  corresponding  benefit 
in  fitness.  This  result  was  unanticipated,  but  is  plausi¬ 
ble  as  new  solutions  evolve,  with  varying  complexity, 
attempting  to  maximize  fitness. 

The  cumulative  fitness  function  results  (multi¬ 
plied  by  10  to  shift  upward  for  easier  comparison 
with  the  estimated  complexity)  are  shown  in 
Figure  132.  Note  that  the  pointsof  high  complexity 


Figure  132.  Complexity  and  Fitness  Comparison. 


always  coincide  with  pointsof  low  cumulative  fitness. 
Pointsof  relatively  low  complexity  correspond  to 
high  cumulative  fitness.  Arrows  point  to  the  extrema 
in  the  cumulative  fitness  function  and  estimated 
complexity  that  can  be  seen  to  align  with  extrema  in 
the  fitness  function.  In  particular,  minima  in  esti¬ 
mated  complexity  occur  simultaneously  with  oppos¬ 
ing  maxima  in  the  fitness.  This  indicates  an  inverse 
relationship  between  complexity  and  cumulative  fit¬ 
ness  extreme  points. 

Consider  the  complexity  of  the  fitness  function 
itself.  The  fitness  function  is  an  algorithmic  repre¬ 
sentation  of  the  fitness  of  a  chromosome.  The  range 
resulting  in  maxima  generated  from  the  fitness  func¬ 
tion  forms  a  string  that  represents  the  target  com¬ 
plexity.  In  thisparticular  genetic  algorithm  example, 
a  solution  of  0.5  for  all  725  members  of  the  popula¬ 
tion  would  yield  an  estimated  complexity  of  611.3. 
Thislowa  level  of  complexity  was  never  reached  for 
two  reasons:  there  are  multiple  optimal  solutions, 
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namely  odd  multiples  of  0.5,  and  the  algorithm 

never  exactly  achieved  odd  multiples  of  0.5,  but 
rather  approximately  close  values.  The  remaining 
secti  on  s  d  i  scu  ss  h  o  w  th  ese  con  cepts  h  ave  been 
implemented  to  construct  a  fault  tolerant  network. 

TOWARDS  A  SELF-EVOLVING  NETWORK 
SYSTEM 

Other  papers  from  the  Imperishable  Networks 
P  roj  ect  h  ave  d  evel  o  p  ed  co  m  p  I  ex  i  ty-based  tech  n  i  q  u  es 
for  fault  detection  and  identification  asdiscussed  in 
[210]  and  [193].  The  focus  of  this  report  is  on 
progress  towards  self-composition  of  solutions 
assuming  that  other  techniques,  particularly  com¬ 
plexity-based  techniques,  have  identified  faults.  A 
problem  with  the  genetic  algorithm-based  approach 
as  previously  described  for  use  as  a  self-evolving  sys¬ 
tem  isthat  control  is  generally  external  to  the 
genetic  material  and  the  genetic  material  is  gener¬ 
ally  considered  to  be  passive  data.  Instead  the 
genetic  material  should  be  capable  of  being  algorith¬ 
mic  information,  that  is,  program  code  or  objects.  In 
addition,  each  chromosome,  as  an  object,  should 
contain  the  necessary  capability  to  run  the  genetic 
algorithm.  Thiswould  allow  for  a  highly  distributed 
and  robust  genetic  algorithm  capable  of  fault  mitiga¬ 
tion  where  the  fault  is  represented  through  the  fit¬ 
ness  function. 

A  criticism  of  thisapproach  might  be  that  a  genet¬ 
ically-engineered  protocol  stack  will  create  a  com¬ 
plex  framework  that  will  be  difficult  to  understand 
and  maintain.  However,  our  approach  isto  compose 
the  framework  from  simple  components.  Each  of 
these  components  will  be  individually  verifiable  with 
respect  to  its  properties  and  actions.  As  the  compo¬ 
nents  are  arbitrarily  composed  to  form  a  protocol 
stack,  some  protocol  stacks  may  be  generated  that 
violate  the  principles  of  safety,  consistency  and  cor¬ 
rectness.  One  way  to  approach  this  isto  define  a  fit¬ 
ness  function  that  verifies  the  suitability  of  the  stack 
with  respect  to  the  properties  desired.  Any  mis-con- 
figured  protocol  stacks  are  automaticallyeliminated 
from  consideration  if  the  fitness  function  iscarefully 
defined  to  check  for  the  above-mentioned  proper¬ 
ties.  However,  this  might  make  the  definition  of  the 
fitness  function  itself  cumbersome  as  every  possible 
stack  composition  property  will  have  to  known  a  pri¬ 
ori  and  an  appropriate  fitness  "filter"  defined.  This 
will  lead  to  a  lossof  elegance  in  the  fitness  function 
definition  and  consequently  poor  maintainability.  A 
better  approach  would  be  to  define  syntactic  and 


certain  semantic  composition  properties  in  the  indi¬ 
vidual  components  themselves,  possibly  in  the  form 
of  logical  expressions.  These  expressions  will 
enforce  constraints  on  the  behavior  of  the  compo¬ 
nents,  which  can  be  verified  at  run-time.  The  run¬ 
time  system  will  embed  a  theorem-prover,  which  can 
be  either  a  full-blown  prover  like  PVS,  NuPrl  or  SPIN 
or  a  reduced  version  of  one,  to  systematically  verify 
propertiesduring composition  itself. Thisreduces 
the  burden  on  the  programmer  to  define  a  proper 
fitness  function  that  can  catch  and  eliminate  all 
types  of  composition  errors. 

Approach 

Genetic  material  begins  in  a  random  state  (M),  and 
converges  to  the  complexity  of  the  optimal  value 
produced  by  the  fitness  function.  Thisenables  true 
solution  composition  from  a  wide  range  of  possible 
solutions.  One  problem  with  thisapproach  isthe 
time  required  evolving  towards  a  feasible  solution. 
Another  problem  isthe  fitness  function  itself  has  to 
be  self-generated  in  some  manner.  Using  Active 
function  exists  in  the  form  of  He  where  He  isthe 
estimatedVirtual  Network  Management  Prediction 
[196],thefitnesscorrectoperationhypothesisofthe 
system  as  described  in  [185], 

In  summary,  the  experiment  in  this  section  has 
shown  a  relationship  among  fitness,  complexity,  and 
the  evolution  of  genetic  material.  Complexity  esti¬ 
mation  probes  have  been  embedded  in  the  General 
Electric  Global  Research  Center  Active  Network  test¬ 
bed  for  use  in  security  experimentation.  The  next 
section  explainsthe  framework  developed  to  utilize 
the  same  complexity  probes  described  in  [208]  to 
control  the  evolution  of  a  genetic  program  within 
the  active  network.  This  makes  the  network  highly 
resilientto  faults  by  enabling  the  capability  to  adapt 
in  a  wide  variety  of  ways. 

GENETIC  NETWORK  PROGRAM  M ING 
ARCHITECTURE 

The  Magician  Active  Network  [196]  overlay  network 
is  used  to  test  the  feasibility  of  the  genetically  pro¬ 
grammed  network  service  concept.  An  active  packet 
representing  the  nucleus  (assuming  network  nodes 
are  like  eucaryotes- cells  containing  nuclei)  is 
injected  into  all  the  'network  nodes.  The  nucleus 
containsa  population  of  chromosomes- strings  of 
functional  units.  Operation  of  Genetic  Network  Pro¬ 
gramming  beginswith  the  injection  of  basic  building 
blocks,  known  as  functional  units,  into  the  network 
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as  shown  in  Figure  133.  Cur  rently,  this  "genetic 


Architecture 


Nucleus  Injected  at  Startup 


Figure  133.  Injection  of  the  Nucleus. 


Figure  134.  Functional  Units,  Evolution,  and  Fitness. 


material"  isflooded  into  each  active  node.  However, 
the  material  will  remain  inactive  in  each  active  node 
until  a  fitness  function  is  injected  into  the  network. 
Receipt  of  a  fitness  function  will  cause  evolution  to 
proceed. 

Functional  units  are  very  small  pieces  of  code 
blocks  that  perform  simple,  well-defined  operations 
upon  an  active  packet.  Examples  of  functional  units 
are  Delay,  Split,  Join,  Clone,  and  Forward.  There  is  also 
a  Mtttfunctional  unit  whose  use  is  explained  later. 
Chromosomes  are  strings  of  functional  units  as 
shown  in  Figure  134.  Once  a  chromosome  is  assem¬ 
bled,  the  codonscan  be  translated  into  Amino  Acids 
at  the  Ribosomes.  In  other  words,  the  string  of  func¬ 
tional  units  will  operate  upon  active  packets  from 
other  applications  (or  other  functional  units)  that 
traverse  through  the  node.  The  chromosome  is  rep¬ 
resented  in  the  code  in  a  form  similar  to  a  Lisp  sym¬ 
bolic  expression,  for  example:  ((Null Join  Split)  (Delay 
Split  Join  Delay) ). 

M  utation  and  recombination  occur  among  a  pop¬ 
ulation  of  genes.  Mutation  isa  probabilistic  change 
of  afunctional  unit  to  another  functional  unit. 
Recombination  isthe  exchange  of  chromosome  sec- 
tionsfrom  two  different  chromosomes.  In 
Figure  135,  a  close-up  of  a  single  node  can  be  seen 
containing  a  very  short  chromosome  strand. 

A  single  incoming  traffic  stream,  as  shown  in 
Figure  136  entering  the  center  node,  is  split  into 
multiple  streams.  Each  stream  is  processed  by  a  dif¬ 
ferent  chromosome.  Note  that  currently  in  our 
implementation,  the  full  traffic  stream  is  split  along 


Single  Node  Active  Evolutionary 
Control  Architecture 


Figure  135.  Single  Node  Genetic  Programming  Architecture. 

each  chromosome,  however,  it  is  hypothesized  that 
traffic  sampling  could  be  used  to  reduce  the  over¬ 
head  in  creating  the  multiple  streams. 

Traffic  and  Evolution 


(functional  units1  functional  units2  functional  unitss  ) 


Figure  136.  Breeding  Traffic  Streams. 
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As  shown  in  Figure  137,  fitness  functions  can  be 
designed  to  measure  quality  at  different  layersof  the 
traditional  protocol  stack.  In  this  particular  case,  fit¬ 
ness  measures  are  shown  at  the  Transport,  Network, 
and  Link  Layers.  Asa  particular  example,  jitter  con¬ 
trol  might  have  a  fitness  function  that  minimized 
per  frame  variance  at  the  Link  Layer.  The  Network 
Layer  would  attempt  to  maximize  packet  arrives  at 
the  destination  in  the  reasonable  time  period,  that  is 
perform  the  routing  function.  TheTransport  Layer 
would  have  a  fitness  function  that  attempts  to  mini¬ 
mize  end-to-end  packet  variance.  The  key  is  that 
each  of  these  fitnessfunctions  need  to  work  together 
towards  reaching  the  stated  goal  in  a  reasonable 
manner.  More  will  be  said  about  the  fitness  function 
later. 

Multi-Level  Fitness  Functions 


(functional  units,  functional  units2  functional  units,  ) 


Figure  137.  M  ultiple  Levels  of  Fitness. 

In  Figure  138,  recombination  can  occur  both  within 
a  node  or  between  two  nodes.  In  addition,  as  shown 
in  Figure  139,  changing  the  route  of  a  packet  also 
effectively  accomplishes  a  recombination  because 
the  packet  processing  will  be  dependent  upon  the 
genetic  material  at  each  node  traversed. 

A  key  component  of  the  evolutionary  process  is 
thefitnessfunction.  Fitnessfunctions  are  "user" 
defined  and  injected  into  the  network  to  control  the 
evolution  of  thegenetic  population.  For  example,  in 
our  initial  tests,  minimizing  variance  in  transmission 
time  was  used  as  a  simple  fitness  function.  However, 
initial  experiments  quickly  demonstrated  that  the 
design  of  thefitnessfunction  is  the  most  critical  ele¬ 
ment.  It  reminds  one  of  the  saying,  "Be  careful  of 
what  you  pray  for...,  because  you  might  get  it."  Often 
the  fitness  was  achieved,  butin  ways  that  were  unex¬ 
pected  and  sometimes  detrimental  to  the  intended 
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Figure  138.  Recombination  Levels. 

operation  of  the  network.  As  a  trivial  example,  the 
variance  can  be  minimized  by  si  owing  the  traffic  to  a 
near  halt.  Thus,  a  low  latency  term  had  to  be  added 
to  the  jitter  control  fitness  function. 

Effective  Chromosome  Based  Upon  Route 


(functional  units,  functional  units2  functional  units, ) 


(functional  units,  functional  units,  functional  units,  ) 


Figure  139.  Chromosomes  and  Routing. 

Genetically  programmed  active  network  jitter 
control 

Asa  feasibility  test,  an  adaptive  jitter  control  mecha¬ 
nism  wasdeveloped  on  a  fixed,  wired  active  commu¬ 
nication  network  having  the  topology  shown  in 
Figure  139.  Thegenetic  algorithm  was  implemented 
as  an  active  application  in  the  Magician  Active  Net¬ 
work  Execution  Environment  [196].  Packets  origi¬ 
nate  from  the  left-most  node  in  Figure  139and  are 
destined  for  the  right-most  node  in  the  figure.  The 
dominant  contributors  to  packet  link  transit  time 
variability  given  the  topology  shown  in  Figure  139 
are  the  fact  that  the  active  network  is  an  overlay  net- 
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work  that  has  unspecified  lower  -layer  traffic  and  that 
packets  are  loaded  and  executed  within  a  Java  Vir¬ 
tual  Machine  residing  in  each  node  and  are  subject 
to  J  ava  garbage  collection  which  runs  at  unspecified 
times. 

The  fitness  function  on  all  nodes  returns  a  greater 
fitness  as  the  result  of  a  Simple  Network  Manage¬ 
ment  Protocol  query  of  an  Object  Identifier  that 
measures  packet  link  transfer  time  variance  on  the 
destination  node  is  minimized.  As  previously  men¬ 
tioned,  the  fitness  function  is  itself  an  active  packet 
that  consists  of  an  objective  function.  The  function 
is  highly  general  and  can  be  comprised  of  any  math¬ 
ematical  function  of  accessible  metrics. 

Figures  140  through  Figure  143  show  packet  link 
transit  variance  through  three  of  the  chromosomes 
on  the  destination  node  and  Figure  143  shows 
packet  link  transit  variance  without  anyjitter  control 
mechanism  at  the  destination  node.  Initial  observa¬ 
tion  of  the  graphs  shows  that,  overall,  particularly  as 
time  progressed,  the  Chromosomes  significantly 
reduced  packet  transit  variance. 
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Figure  140.  Packet  Link  Transit  Variance  (milliseconds1)  on 
Destination  Node  Through  Chromosome  One. 
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Figure  143.  Packet  Link  Transit  Variance  (milliseconds2)  on 
Destination  Node  Without  J  itter  Control. 
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Figure  141.  Packet  Link  Transit  Variance  (milliseconds2)  on 
Destination  Node  Through  Chromosome  Two. 
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Figure  142.  Packet  Link  Transit  Variance  (milliseconds2)  on 
Destination  Node  Through  Chromosome  Three. 


Another  observation  of  the  experimental  dataisthat 
the  genetically  programmed  transit  variance  was  ini¬ 
tially  worse  than  transit  variance  without  any  control 
mechanism.  The  reason  for  this  isthat  the  chromo¬ 
somes  begin  operation  with  a  random  set  of  func¬ 
tional  units  and  require  time  to  converge  to  an 
optimal  value. 

J  itter  control:  a  simple  test  case 

While  a  priori  techniques  have  been  developed  for 
jitter  control  in  legacy  networks,  jitter  control  forms 
a  simple,  easily  measured  and  controlled  applica¬ 
tion  for  the  network  genetic  programming  tech¬ 
nique.  The  functional  units  injected  into  the 
network  should  allow  evolution  of  a  variety  of  inter¬ 
esting  solutionsto  reduce  variance,  including  add¬ 
ing  delays,  forward  along  different  paths,  or  perhaps 
new  ideas  that  have  not  been  thought  of  yet. 
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The  most  significant  result  from  this  project  has 
been  experimental  validation  thatcomplexityplaysa 
critical  role  in  information  assurance  and  can  be 
broadlyapplied  as  the  basisfor  security  analysis  and 
fault  tolerant  network  design.  ComplexityTheory  is 
a  large  and  rapidly  evolving  science  (Figure  1).  As 
progress  is  made  in  varioustopics  of  Complexity 
Theory,  the  individual  topics  will  help  to  re-enforce 
each  other.  For  example,  Vladimir  Gudkov's  results 
in  the  minimum  dimensions  required  to  character¬ 
ize  information  flow  could  help  develop  a  better 
Kolmogorov  Complexity  estimator.  Our  goal  has 
been  to  reduce  the  requirement  and  dependence 
upon  detailed  a  priori  information  about  known 
attacks  and  detect  novel  attacks  by  computing  vul¬ 
nerability  and  detecting  anomalous  behavior  based 
upon  an  inherent,  fundamental  property  of  infor¬ 
mation  itself,  namely,  its  complexity  and  sophistica¬ 
tion.  Resultsof  complexity  measures  applied  to 
network  protocols,  processes,  and  information  have 
been  presented  and  related  to  Information  Assur¬ 
ance  and  network  fault  tolerance. 

Accurate  estimation  of  KolmogorovComplexity  is 
key  to  its  usefulness  in  identifying  correlation 
between  attack  flows.  We  have  made  progress  in 
leveraging  and  developing  in-line  communication 
network  complexity  estimators  and  we  are  investigat¬ 
ing  and  benchmarking  more  estimators  for  K(x). 
Estimates  of  KolmogorovComplexity  provide  an 
objective  parameter  with  which  to  provide  informa¬ 
tion  assurance  through  anomaly  detection  and 
objective  model  development.  The  capability  of  this 
metric  is  limited  in  part  by  the  accuracy  of  its  estima¬ 
tion,  which  must  be  traded  against  computational 
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expense.  The  Optimal  Symbol  Compression  Ratio 
complexity  and  sophistication  algorithm  may  pro¬ 
vide  additional  capability  to  discern  anomalous 
behavior  in  information  systems.  Further  research  is 
needed  to  develop  strategies  for  cost-effective  use  of 
this  paradigm  across  entire  systems. 

With  respect  to  the  DDoS  detection  technique,  its 
performance  needsto  be  compared  to  more  intelli¬ 
gent  detection  algorithms  that  are  currently  in  use. 
In  particular,  its  performance  has  to  be  measured  in 
terms  of  resource  tradeoffs,  detection  and  false- 
alarm  probability  and  response  time.  For  example, 
the  current  technique  performs  its  evaluation  on 
th  e  en  ti  re  co  n  ten  t  of  th  e  p  acket.  A  n  ecd  otal  evi  d  en  ce 
has  shown  that  performance  degrades  if  the  payload 
of  the  packet  is  encrypted  and  the  size  of  the  pay- 
load  dominates  the  size  of  the  packet.  Techniques 
that  adapt  to  payload  size  have  been  formulated  and 
tested. 

The  next  challenge  iscontinuing  the  develop¬ 
ment  of  the  K-Map  (KolmogorovComplexity Map  of 
a  system)  and  applying  theory  using  Kolmogorov 
Complexity.  For  example,  one  significant  applica¬ 
tion  is  identifying  and  controlling  faults  and  DDoS 
attacks  and  tracing  attacks  back  to  the  attacker.  The 
fundamental  hypothesis  is  that  the  attacker  can  be 
traced  using  a  complexity-based  approach  because 
attacks  must  have  a  common  pattern  because  they 
originate  from  a  common  source.  We  expect  that 
advances  in  ComplexityTheory,  combined  with 
reflective  capability  enabled  by  Active  Networking, 
will  enable  significant  advances  in  network  fault  tol¬ 
erance  and  adaptation. 
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Appendix  A 

Operation  of  the  Network  Insecurity  Path 

Analysis  Tool  (NIPAT) 


Network  Insecurity  Path  Analysis  Tool  (NIPAT) 
allows  various  types  of  node  groupings  in  order  to 
help  visualize  the  vulnerability  paths. 

In  Figure  144,  all  object  types  are  grouped 
together.  The  nodes  could  also  be  grouped  by  such 
characteristics  as  hostname  or  sub-network.  In 
Figure  145,  the  vulnerabilities  that  have  been  identi¬ 
fied  and  grouped  as  vectors  to  vulnerability  targets 
have  been  expanded  to  show  more  detail  about  the 
individual  vulnerabilities. 

I  n  Figure  146,  40  parent  objects  of  sun4nbin  are 
grouped  within  a  single  node.  Also  note  that  the 
root  account  is  clearly  visible  as  reachable  through 
the  vulnerability  path.  The  next  paragraph  provides 
an  exampleof  analyzing  a  wjlnerabilitygraph  thatprovidesa 
quick  introduction  to  more  of  NIPAT'scapabilities 

0  ne  of  the  security  assessment  operations  N I  PAT 
can  perform  isto  determine  the  vulnerability  of  a 
particular  entity  given  an  attack  on  a  particular 
node.  The  target  entity  Host  C  Vulnerability  4  is 
identified  by  a  white  crosshair  in  Figure  147,  and 
the  attacking  node  is  labeled  Attacker  with  a  flow 


identified  by  the  label  of  its  connecting  path.  The 
optimal  vulnerability  path  isthesum  of  flows  into 
node  FI  ost  C  Vulnerability  4,  as  shown  in  Figure  148. 
It  has  flow  strength  of  6.0.  In  Figure  149  the  optimal 
path  that  the  attacker  can  take  to  reach  the  target  is 
shown  in  yellow.  Thus  NIPAT  providesthe  ability  to 
examine  howthe  placement  of  security  safeguards 
such  as  intrusion  detectors  with  in  the  network  affect 
total  network  security.  In  effect,  thistool  becomes  a 
security-modeling  tool,  where  one  can  experiment 
with  the  placement  of  security  safeguards  represent¬ 
ing  such  entities  as  firewalls,  intrusion  detectors,  and 
access  lists.  These  can  be  positioned  at  various  loca¬ 
tions  in  order  to  determine  network  security. 

There  are  two  main  algorithmsthatcan  be  run  in 
NIPAT;  the  first  is  a  probabilistic  analysis  and  the  sec¬ 
ond  isa  maximum  flow  analysis.  Let  us  start  with  the 
probabilistic  analysis.  Select  a  node  to  be  the  target  of  the 
attack  by  clicking  on  the  Select  Nodes  toggle  button .  T  h  en 
select  a  node;  in  this  case  we  have  selected  Host  C 
Vuln 4.  A  white  crosshair  will  appear  over  the  node 
to  indicate  it  has  been  selected.  Choose  Algorithms  and 


Figure  144.  Attack  Vectors. 
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File  Algorithms  Edit  Properties 
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Figure  145.  Vector  Graph  Expanded  View. 


Figure  146.  Target  Details  Expanded. 

Security  Analysis  Models  and  finally  choose  Probabilistic 
Analysis.  A  text  window,  shown  in  Figure  148,  should 
appear  which  states  the  probability  of  successful 
attack  followed  by  the  result  graph  shown  in 
Figure  149.  The  result  graph  shows  the  most  proba¬ 
ble  path  of  attack  highlighted.  The  edge  values  are 
normalized  between  zero  and  one  to  represent  the 
probability  of  an  attacker  choosing  that  path. 

Now  let  us  re-run  the  analysis  using  the  maximal 
flow  algorithm  [12].  Choose  File  and  Open  GML.  Then 
choose  the  gml  directory  and  choose  the  example,  gml  file. 
The  graph  window  should  appear.  Select  Host  CVuln 


4  again  and  choose  Algorithms  and  Security  Analysis 
Models  and  Max  Flow  Analysis.  The  text  wi  n  d  O  W 
shown  in  Figure  150  should  appear  as  well  as  the 
graph  results  shown  in  Figure  151.  The  edge  values 
have  been  changed  to  show  the  maximum  flow 
along  each  edge  towards  the  target  node.  In  this 
case  there  is  a  flow  of  1.0  and  a  flow  of  5.0  that  can 
reach  the  target  node. 
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Figure  147.  Probabilistic  Attack  Path  Analysis. 


Figure  148.  Probability  of  Attack. 
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Figure  149.  M  ost  Likely  Attack  Path. 


122 


Figure  150.  Maximum  Flow  Results. 
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Figure  151.  Maximum  Flow  Graph. 
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Appendix  B 

Draft  Standard:  Inline  Network  Management 


B.l.  IN-LINE  NETWORKMANAGEMENT 
PREDICTION  DRAFT-IETF-BUSH-IN LIN  E- 
PREDICTIVE-MGT-00 

Status  of  this  Memo 

•  This  document  is  an  Internet-Draft  and  is  in 
full  conformance  with  all  provisions  of  Section 
10  of  RFC2026. 

•  Internet-Drafts  are  working  documents  of  the 
Internet  Engineering  Task  Force  (IETF),  its 
areas,  and  its  working  groups.  Note  that  other 
groups  may  also  distribute  working  documents 
as  Internet- Drafts. 

•  Internet-Drafts  are  draft  documents  valid  for  a 
maximum  of  six  months  and  maybe  updated, 
replaced,  or  obsoleted  by  other  documents  at 
anytime.  It  is  inappropriate  to  use  Internet- 
Drafts  as  reference  material  or  to  cite  them 
other  than  as  "work  in  progress.” 

•  The  list  of  current  Internet-Drafts  can  be 
accessed  at  http:/  /  www.ietf.org/  ietf/  lid- 
abstracts,  txt. 

•  The  list  of  I  nternet-Draft  Shadow  Directories 
can  be  accessed  at 

http:/  /  www.ietf.org/  shadow.html. 

•  T  h  i  s  I  n  ter  n  et-D  raft  wi  1 1  expireon  December  30, 
2002. 

•  Copyright  Notice:  Copyright  (C)  The  Internet 
Society  (2002).  All  Rights  Reserved. 

Abstract 

In-line  network  management  prediction  exploits 
fine-grained  models  of  network  components, 
injected  into  the  communication  network,  to 
enhance  network  performance.  Accurate  and  fast 
prediction  of  local  network  state  enables  more  intel¬ 
ligent  network  control  resulting  in  greater  perfor¬ 
mance  and  fault  tolerance.  Accurate  and  fast 
prediction  requires  algorithmic  capability.  Active 
and  Programmable  Networking  have  enabled  algo¬ 
rithmic  information  to  be  dynamically  injected  into 
the  network  allowing  enhanced  capability  and  flexi¬ 
bility.  One  of  the  new  capabilities  is  enhanced  net¬ 
work  management  via  in-line  management  code, 
that  is,  management  algorithms  embedded  within 
intermediate  network  devices.  In-line  network  man¬ 


agement  prediction  utilizes  low-level  algorithmic 
transport  capability  to  implement  low-overhead  pre¬ 
dictive  management. 

A  secondary  purpose  of  this  document  isto  pro¬ 
vide  general  interoperability  information  for  the 
injection  of  general  purpose  algorithmic  informa¬ 
tion  into  network  devices.  This  document  may  help 
in  some  manner  to  serve  as  a  temporary  bridge 
between  Internet  Protocol  and  Active  and  Program¬ 
mable  Network  applications.  This  may  stimulate 
some  thought  as  to  the  content  and  format  of  "stan¬ 
dards”  information  potentially  required  for  Active 
Networking.  Management  of  the  Internet  Protocol 
and  Active  and  Programmable  Networking  is  vital. 

In  particular,  coexistence  and  interoperability  of 
active  networking  and  Internet  Protocol  manage¬ 
ment  is  specified  in  order  to  implement  the  injec¬ 
tion  of  algorithmic  information  into  a  network. 

Implementation  Note 

This  document  proposes  a  standard  that  assumes 
the  capability  of  injecting  algorithmic  information, 
i.e.  executable  code,  into  the  network.  Active  or  pro¬ 
grammable  capability,  as  demonstrated  by  recent 
implementation  results  from  the  DARPA  Active  Net¬ 
work  Program,  Active  Internet  Protocol  [8]  or 
recent  standards  in  Programmable  Networking  [9], 
help  meet  this  requirement.  While  in-line  predictive 
management  could  be  standardized  via  a  vehicle 
other  than  active  packets,  we  choose  to  use  active 
networking  as  a  convenient  implementation  for 
algorithmic  change  within  the  network. 

B.2.  INTRODUCTION 

Thiswork  in  progress  describes  a  mechanism  that 
allowsa  distributed  model,  injected  into  a  network, 
to  predict  the  state  of  the  network.  The  concept  is 
illustrated  in  Figure  152.  The  state  to  be  predicted  is 
modeled  within  each  actual  network  node.  Thus,  a 
distributed  model,  shown  in  thetop  plane,  isformed 
within  the  actual  network,  shown  in  the  bottom 
plane.  Thetop  plane  slides  ahead  of  wall  clock  time, 
although  in  an  asynchronous  manner.  This  means 
that  each  simulated  node  MAY  have  itsown  notion 
of  simulation  time. 


124 


/  / - o...  / 

/  o - o...  / 

/  / - o - o...  / 

/_Distributed  Network  Model  Plane _ / 

(spatially  located  inside  the  actual  network  below,  but 
temporally  located  ahead  of  the  actual  network) 


/  / 

/  / - o. . .  / 

/  o - o...  / 


/  / - o - o...  / 

/_Actual  Network  Plane _ / 

Figure  152.  The  Distributed  Model  Inside  the  Network. 

Thisconcept  opensup  a  set  of  interoperability 
issues  which  do  not  appear  to  have  been  fully 
addressed.  How  can  distributed  model  components 
be  injected  into  an  existing  network?  In-line  models 
are  injected  into  the  network  assuming  the  overlay 
environment  shown  in  Figure  153. 1  n-line  models  in 
Figure  152  are  designed  to  run  as  fast  as  possible  in 
order  to  maintain  a  simulation  time  that  is  ahead  of 
wallclock,  communicating  via  virtual  messages  with 
future  timestamps.  What  if  messages  are  processed 
out-of-order  because  they  arrive  out-  of-order  at  a 
node?  How  long  do  you  wait  (and  slow  your  simula¬ 
tion  down)  to  make  sure  they  are  not  out-of-order? 
This  specification  provides  a  framework  that  allows 
synchronization  to  be  handled  in  any  manner;  e.g. 
via  a  conservative  (blocking)  or  optimistic  (Time- 
Warp)  manner  within  the  network.  Additionally, 
how  can  the  models  verify  and  maintain  a  reason¬ 
able  amount  of  accuracy?  A  mechanism  is  provided 
in  thisdocumentto  al  low  local  verification  of  predic¬ 
tion  accuracy.  Attempts  to  adjust  accuracy  are  imple¬ 
mentation  dependent.  How  do  independent  model 
developers  allowtheir  modelsto  work  coherently  in 
thisframework?  Model  operation  is  implementation 
dependent,  however,  thisspecification  attempts  to 
make  certain  that  model  messages  will  at  least  be 
transported  in  an  interoperable  manner,  both 
across  and  WITH  IN,  intermediate  network  devices. 
How  does  one  publish  their  model  descriptions? 
How  are  predicted  values  represented  and  accessed? 
Suggestion  solutionsfor  these questionsare  pre¬ 
sented  in  thisdocument  as  well. 

Overview 

In-line  predictive  network  management,  which 
enables  greater  performance  and  fault  tolerance,  is 
based  upon  algorithmic  information  injected  into  a 
network  allowing  system  state  to  be  predicted  and 
efficiently  propagated  throughout  the  network.  This 
paradigm  enables  management  of  the  network  with 
continuous  projection  and  refinement  of  future 


state  in  real  time.  In  other  words,  the  models 
injected  into  the  network  allow  state  to  be  predicted 
and  propagated  throughout  the  network  enabling 
the  network  to  operate  simultaneously  in  real  time 
and  in  the  future.  The  state  of  traffic,  security,  mobil¬ 
ity,  health,  and  other  network  properties  found  in 
typical  Simple  Network  Management  Protocol 
(SNMP)  [2],  Management  Information  Bases(MIB) 
is  available  for  use  by  the  management  system.  To 
enable  predictive  management  of  applications,  new 
M  IBs  will  have  to  be  defined  that  hold  both  current 
values  as  well  as  values  expected  to  exist  in  the 
future. 

TheAgentX  [5]  protocol  beginsto  address  the 
issue  of  independent  SN  M  P  agent  developers 
dynamically  and  seamlessly  interconnecting  their 
agents  into  a  single  M  IB  under  the  control  of  a  mas¬ 
ter  agent.  AgentX  specifies  the  protocol  between  the 
master  and  sub-agents  allowing  the  sub-agents  to 
connect  to  the  master  agent.  The  AgentX  specifica¬ 
tion  complements  this  work-in-progress,  namely,  in¬ 
line  network  management  prediction.  The  in-line 
network  management  prediction  specification  pro¬ 
vides  the  necessary  interface  between  agent  func¬ 
tionality  injected  remotely  via  an  Active  Packet  and 
dynamically  linked1  into  a  M  IB.  The  agent  code  may 
enhance  an  existing  MIB  value  by  allowing  it  to 
return  predicted  values.  Otherwise,  coexistence  with 
AgentX  isSUGGESTED.  The  in-line  network  man¬ 
agement  prediction  specification  enables  faster 
development  of  MIB  modules  with  more  dynamic 
algorithmic  capability  because  Active  and  Program¬ 
mable  networks  allow  lower-level,  secure,  dynamic 
access  to  network  devices.  This  has  allowed  injection 
of  predictive  capability  into  selected  portions  of 
existing  M  IBs  and  into  selected  portionsof  active  or 
programmable  network  devices  resulting  in  greater 
performance  and  fault  tolerance. 

Outline 

Thisdocument  proposes  standards  for  the  following 
aspects  of  in-line  predictive  management: 

•  SNMP  Object  Time  Series  Representation  and 
Manipulation 

•  Common  Algorithmic  Description 

•  Multi-Party  I  n-line  Predictive  Model  Access  and 
Control 

•  Common  Framework  for  Injecting  Models  into 
the  Network 

•  Model  Interface  with  the  Framework 
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The  high-level  componentsof  thisproposed  stan¬ 
dard  are  shown  in  Figure  153.  The  Active  Network 
Framework  [10]  is  a  work  in  progress.  In-line  Predic¬ 
tive  Management  is  the  subject  of  this  document. 
The  Internet  Protocol  and  SNMP  are  well-known. 

Figure  153  shows  the  various  ways  in  which  in-line 
predictive  management  can  be  used  in  an  active  net¬ 
work  given  an  implementation  in  a  particular  execu¬ 
tion  environment.  The  in-line  predictive 
management  application  runs  as  an  active  applica¬ 
tion  on  an  active  node.  The  framework  is  indepen¬ 
dent  of  the  underlying  architecture  of  the  active 
network,  which  can  take  one  of  two  forms.  The  pro¬ 
tocol  stack  on  the  left  shows  a  fully  active  network  in 
which  the  Node  Operating  System  runs  one  or  more 
Execution  Environments.  Multiple  active  applica¬ 
tions  may  execute  in  any  Execution  Environment. 
The  protocol  stack  on  the  right  shows  the  architec¬ 
ture  of  an  active  network  overlayover  IP.  Essentially, 
the  overlay  scheme  uses  the  Active  Network  Encap¬ 
sulation  Protocol  (ANEP)  [7]  asaconduitto  use  the 
underlying  IP  network.  The  predictive  management 
application  executes  alongside  the  other  active 
applicationsand  interacts  with  any  managed  active 
applicationsto  provide  their  future  state.  Since  the 
predictive  management  application  requires  only 
the  execution  environment  to  run  in,  it  is  indepen¬ 
dent  of  whether  the  active  network  is  implemented 
as  an  overlay  or  it  is  available  as  a  fully  active  net¬ 
work. 

The  next  section  provides  basic  definitions.  Fol¬ 
lowing  that,  the  goals  of  thisproposed  standard  are 


I  Active  | Active  |  In-line  | 
I  Appl  |  Appl  |  Predictive  I 
I  Management  I 


|  Active  |  Active  |  In-line  I 

|  Appl  |  Appl  |  Predictive  | 

I  Management  | 


Active  Net  EE 


Active  Net  EE 


NodeOS 


ANEP 


Node  OS 


ANEP 


I  Internet  Protocoil  SNMP 


Active  Network  over  IP 


Figure  153.  Relationship  Among  Underlying  Assumptions 
about  the  Predictive  Management  Environment. 


laid  out.  The  remainder  of  the  document  develops 
into  progressively  more  detail  defining  interopera¬ 
bility  among  algorithmic  in-line  network  manage¬ 
ment  prediction  components.  Specifically,  predictive 
capability  requires  careful  handling  of  the  time 
dimension.  Rather  than  change  the  SNMP  standard, 
a  tabular  technique  is  suggested.  Then,  in  order  to 
simplify  design  of  predictive  management  objects, 
an  extension  to  Case  Diagrams issuggested  for 
review  and  comment.  This  isfollowed  by  the  specifi¬ 
cation  of  a  distributed  predictive  framework.  It  is 
understood  that  multiple  distributed  predictive 
mechanisms  exist,  however,  thisframework  is  pre¬ 
sented  for  comment  and  review  because  it  contains 
all  the  necessary  elements.  Finally,  the  detailed  inter¬ 
face  between  the  active  or  programmable  code  and 
IP  standard  interfaces  is  presented. 

Definitions 

The  following  acronyms  and  defi n itions  are  helpful 
in  understanding  the  general  concept  of  predictive 
network  management. 


In-line 


Located  within,  or  immediately  adjacent  to,  the  flow  of  network  traffic. 


Predictive  Network  The  capability  of  reliably  predicting  network  events  or  the  state  of  the  network  at  a  time 
Management  greaterthan  wall-clock  time. 

Fine-Grained  Models  Small,  light-weight,  executable  code  modules  that  capture  the  behavior  of  a  network  or 
application  component  to  enable  predictive  network  management. 


Algorithmic  Information  Information,  in  the  form  of  algorithms  contained  inside  executable  code,  as  opposed  to 

static,  non-executable  data.  Depending  upon  the  complexity  of  the  information  to  be  trans¬ 
ferred,  an  algorithmic  form,  or  an  optimal  tradeoff  between  algorithmic  and  non-algorith- 
mic  form  can  be  extremely  flexible  and  efficient. 


Non-Algorithmic  Infor-  Information  that  cannot  be  executed.  Generally  requires  a  highly  structured  protocol  to 
mation  transfer  with  well-defined  code  pre-  installed  at  all  points  in  route  including  source  and 

destination. 


Small-State 


Information  caches  that  can  be  created  at  network  nodes,  intended  for  use  by  executable 
components  of  the  same  application. 
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Global-State 

Information  caches  created  at  network  nodes,  intended  to  be  used  by  executable  compo¬ 
nents  of  different  applications. 

Multi-Party  In-line  Pre¬ 
dictive  Management 
Model 

An  in-line  predictive  management  model  comprised  of  multiple  in-  line  algorithmic  models 
that  are  developed,  installed,  utilized,  and  administered  by  multiple  domains. 

The  following  acronyms  and  definitions  are  use-  dictive  network  management  framework  described 

ful  in  understanding  the  details  of  the  specific  pre-  in  thisdocument. 


A  (Anti-Toggle) 

Used  to  indicate  an  anti-message.  The  anti-message  is  initiated  by  rollback  and  is  used  to 
keep  the  system  within  a  specific  range  of  prediction  accuracy. 

AA  (Active  Application) 

An  active  network  protocol  or  service  that  is  injected  into  the  network  in  the  form  of 
active  packets.  The  active  packets  are  executed  within  the  EE. 

Active  Network 

A  network  that  allows  executable  code  to  be  injected  into  the  nodes  of  the  network  and 
allows  the  code  to  be  executed  at  the  nodes. 

Active  Packet 

Anti-Message 

The  executable  code  that  is  injected  into  the  nodes  of  an  active  network. 

An  exact  duplicate  of  a  virtual  message  except  for  that  the  Anti-  toggle  bit  is  set.  An  Anti¬ 
message  is  used  to  annihilate  an  invalid  virtual  message.  This  is  an  implementation  spe¬ 
cific  feature  relevant  to  optimistic  distributed  simulation. 

DP  (Driving  Process) 

Generates  virtual  messages.  Generally,  the  DP  is  implemented  as  an  algorithm  that  sam¬ 
ples  network  state  and  transforms  the  state  into  a  prediction.  The  prediction  is  repre¬ 
sented  by  a  virtual  message. 

EE  (Execution  Environ¬ 
ment) 

Lookahead 

The  active  network  execution  environment.  The  environment  that  resides  on  active  net¬ 
work  nodes  that  executes  active  packets. 

The  difference  between  Wallclock  and  LVT.  This  value  is  the  distance  into  the  future  for 
which  predictions  are  made. 

LP  (Logical  Process) 

An  LP  consists  of  the  Physical  Process  and  additional  data  structures  and  instructions 
which  maintain  message  order  and  correct  operation  as  a  system  executes  ahead  of  real 
time 

LVT  (Local  Virtual  Time) 

The  LP  contains  a  notion  of  time  local  to  itself  known  as  LVT.  A  node's  LVT  may  differ  from 
other  nodes'  LVT  and  Wallclock.  LVT  is  a  local,  asynchronous  notion  of  time. 

M  (Message) 

The  message  portion  of  a  Virtual  Message  is  implementation  specific.  This  proposed  stan¬ 
dard  SUGGESTS  that  the  message  contents  be  opaque,  however,  an  SNMP  varbind, 
intended  to  represent  future  state,  MAY  be  transported.  Executable  code  may  also  be 
transported  within  the  message  contents. 

NodeOS  (Node  Operat¬ 
ing  System) 

PP  (Physical  Process) 

The  active  network  Operating  System.  The  supporting  infrastructure  on  intermediate  net¬ 
works  nodes  that  supports  one  or  more  execution  environments. 

A  PP  is  an  actual  process.  It  usually  refers  the  actual  process  being  modeled,  or  whose 
state  will  be  predicted. 

QS (Send  Queue) 

A  queue  used  to  hold  copies  of  messages  that  have  been  sent  by  an  LP  The  messages  in 
the  QS  may  be  sent  as  anti-messages  if  a  rollback  occurs. 

Rollback 

The  process  of  adjusting  the  accuracy  of  predictive  components  due  to  packets  arriving 
out-of-order  or  out-of-tolerance.  Rollback  is  specific  to  optimistic  distributed  simulation 
techniques  and  is  thus  an  implementation  specific  feature. 

RT  (Receive  Time) 

The  time  message  value  is  predicted  to  be  valid. 
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RQ  (Receive  Queue) 

SQ  (State  Queue) 

Tolerance 
TR  (Real  Time) 

TS  (Send  Time) 

VM  (Virtual  Message) 
Wallclock 


A  queue  used  in  the  algorithm  to  hold  incoming  messages  to  an  LP.  The  messages  are 
stored  in  the  queue  in  order  by  receive  time. 

The  SQ  is  used  as  a  LP  structure  to  hold  saved  state  information  for  use  in  case  of  a  roll¬ 
back.  The  SQ  is  the  cache  into  which  pre-computed  results  are  stored. 

A  user-specified  limit  on  the  amount  of  prediction  error  allowed  by  an  LP's  prediction. 
The  current  time  as  a  time-stamp  within  a  virtual  message. 

The  LVT  that  a  virtual  message  has  been  sent.  This  value  is  carried  within  the  header  of 
the  message.  The  TS  is  used  for  canceling  the  effects  of  false  messages. 

A  message,  or  state,  expected  to  exist  in  the  future. 

The  current  time. 


Goals 

The  goalsof  thisdocument  are... 

•  Simplicity — This  document  attemptsto  describe 
the  minimum  necessary  elements  for  in-line 
management  prediction.  Model  developers 
should  be  able  to  inject  models  into  the  net¬ 
work  allowing  SNMP  Object  value  prediction. 
Such  modelsshould  work  seamlessly  with  other 
predictive  models  in  the  network.  The  goal  isto 
minimize  the  burden  on  the  model  developer 
while  also  insuring  model  interoperability. 

•  Conformance— Thisdocument  attempts  con¬ 
formance  with  existing  standardswhen  and 
where  it  is  possible  to  do  so.  The  concept  isto 
facilitate  a  gradual  transition  to  the  active  and 
programmable  networking  paradigm. 

•  In-line  Algorithmically-Based  Management — T  h  i  S 
document  attemptsto  introduce  the  use  of  in¬ 
line  algorithmic  management  information. 

B.3.  A  COMMON  REPRESENTATION  OF  SNMP 
OBJ  ECT TIME  SERIES  FOR  IN-LINE 
NETWORK  MANAGEM  ENT  PREDICTION 

SNMP,  as  currently  defined,  has  a  very  limited 
notion  of  time  associated  with  state  information. 

The  temporal  semantics  are  expected  to  be  applied 
to  the  state  by  the  applications  reading  the  informa¬ 
tion.  On  the  other  hand,  predictive  management 
requires  generation,  handling  and  transport  of 
information  that  understandsthe  temporal  charac¬ 
teristics  of  the  state,  i.e.  whether  the  information  is 
current,  future,  or  perhaps  past  information.  I  n 
other  words,  capabilityfor  handling  the  time  dimen¬ 
sion  of  management  information  needsto  be 
extended  and  standardized  in  some  manner.  In  this 
section,  we  propose  a  mechanism  for  handling  time 


issues  in  predictive  management  that  require  mini¬ 
mal  changes  from  the  SNM  P  standard. 

A  proposed  standard  technique  for  handling  the 
time  dimension  in  predictive  state  systems  is  to  build 
the  SNM  P  Object  as  a  Table  Object  indexed  by  time. 
Thisisshown  in  thefollowing  excerptfrom  a  Load 
Prediction  MIB... 

Figure  154.  MIB  Structure  for  Handling  ObjectValues  with  Pre¬ 
dictive  Capability. 

In  Figure  155, the resultof  an  SNMP  queryof the 
relevant  predictive  MIB  Object  isdisplayed.  Because 
the  identifiers  are  suffixed  by  time,  the  object  values 
are  sorted  temporally.  If  a  client  wishes  to  know  the 
next  predicted  event  on  or  before  a  given  time,  the 
query  can  be  formulated  asaGET-NEXT  with  the 
next  predicted  event  time  to  be  determined  as  the 
suffix.  The  GET-NEXT-RESPONSE  will  contain  the 
next  predicted  event  along  with  its  time  of  occur¬ 
rence.  Otherwise,  a  value  outside  the  table  will  be 
returned  if  no  such  predicted  value  yet  exists. 

ThisallowsSNMP  GET-NEXT  operationsfrom  a 
client  to  locate  an  event  nearest  to  the  requested 
time  as  well  as  search  in  temporal  order  for  next  pre¬ 
dicted  events. 

B.4.  A  COM  M  ON  ALGORITHM  1C  DESCRIPTION 

SNMP,  as  currently  defined,  assumes  that  non-algo- 
rithmic  descriptive  information  will  be  generated, 
handled,  or  transported.  Prediction  requires  model 
development  and  execution.  This  proposed  stan¬ 
dard  SUGGESTSthat  modelsareto  be  small,  low- 
overhead,  and  fine-grained.  Fine-grained  refers  to 
the  fact  that  the  models  are  locally  constrained  in 
time  and  space.  In  this  section,  we  propose  algorith¬ 
mic  descriptions  of  management  models  designed 
to  encourage  the  understanding  and  use  of  in-line 
predictive  management  techniques. 


128 


loadPrediction  OBJECT  IDENTIFIER  : :=  {  loadPredMIB  1  } 

loadPredictionTable  OBJECT-TYPE 
SYNTAX  SEQUENCE  OF  LoadPredictionEntry 
MAX-ACCESS  not-accessible 
STATUS  current 
DESCRIPTION 

"Table  of  load  prediction  information." 

: : =  {  loadPrediction  1  } 

loadPredictionEntry  OBJECT-TYPE 
SYNTAX  LoadPredictionEntry 
MAX-ACCESS  not-accessible 
STATUS  current 
DESCRIPTION 

"Table  of  Atropos  LP  prediction  information." 

INDEX  {  loadPredictionPort  } 

{  loadPredictionTable  1  ) 

LoadPredictionEntry  ::=  SEQUENCE  { 

loadPredictionID 

Displaystring, 

loadPredictionPredictedLoad 

INTEGER, 

loadPredictionPredictedTime 

INTEGER 

) 

loadPredictionID  OBJECT-TYPE 

SYNTAX  Displaystring 

MAX-ACCESS  read-only 

STATUS  current 

DESCRIPTION 

"The  LP  identifier." 

::=  {  loadPredictionEntry  1  } 

loadPredictionPredictedLoad  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647 ) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  predicted  load  on  the  link." 

::=  {  loadPredictionEntry  2  } 

loadPredictionPredictedCPUTirae  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647 ) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  predicted  processor  time  used  by  a  packet 
on  this  node." 

::=  {  loadPredictionEntry  3  ) 

loadPredictionPredictedTime  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .  .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  time  at  which  the  predicted  event  will  be  valid. 
::=  {  loadPredictionEntry  4  ) 


information  to  be  specified,  particularly  for  multi¬ 
party  predictive  model  interaction. 

An  excerpt  of  an  SNMP  Case  Diagram  serves  to 
provide  a  flavor  of  its  current  format.  The  diagram 
below  shows  packets  arriving  from  a  lower  network 
layer.  Some  packets  are  determined  to  have  encod¬ 
ing  errors  and  are  discarded.  The  remaining  packets 
flow  to  the  upper  layer. 

For  the  purposes  of  in-line  predictive  manage¬ 
ment,  modelsSHOULD  be  specified  and  injected 
into  the  system.  These  models  MAY  coexist  with  the 
current  SNMP  management  model  supplementing 
the  information  with  predictive  values.  This  is 
denoted  by  adding  algorithmic  model  information 
to  the  Case  Diagram.  A'+'  sign  after  the  name  of  an 

Object  Identifier  identifies  the  object  as  one  that 
can  return  future  values.  The  model  used  to  predict 
the  future  information  is  written  within  braces  near 
the  Object  identifier  and  incorporates  the  name  of 
the  SNMP  object  identifiers.  This  document  SUG¬ 
GESTS  using  a  common  syntax  for  the  notation  such 
as  that  used  for  code  blocks  by  the  C  Programming 
Language  block  constructs,  Java  Programming  Lan¬ 
guage  blocks,  or  the  notation  used  by  any  number  of 
other  languages.  Standardization  of  the  model  syn¬ 
tax  is  outside  the  scope  of  interest  for  this  docu¬ 
ment.  All  functions  MU  ST  be  defined.  Operating 
system  function  callsMAYNOT  be  used.  The  salient 
point  isthatthe  algorithm  must  be  clearlyand  con¬ 
cisely  defined.  The  algorithm  must  also  be  a  faithful 
representation  of  the  actual  predictive  model 
injected  into  the  system.  As  shown  in 
Figure  157,'encodingErrors'  ispredictively 


Case  Diagrams[4]  provide  a  well-known  represen¬ 
tation  for  the  relation  of  management  information 
to  information  flow  as  shown  in  Figure  156.  The 

A  Upper  Layer 
I 

==+==  outPackets 
I 

I 

+==>  encodingErrors 
I 

I 

==+==  inPackets 
|  Lower  Layer 

details  of  Case  Diagrams  will  not  be  discussed  here 
(see  the  previous  reference  for  more  information). 
The  purpose  of  thissection  isto  illustrate  an 
enhancementto  the  diagram  that  allows  algorithmic 


A  Upper  Layer 
==+==  outPackets 

Figure  157.  A  Sample  Al- 

+==>  encodingErrors+  {  0.1  *  inPackets  )  gorithmic  Description. 

i 

==+==  inPackets 
|  Lower  Layer 

enhanced  to  be  10%  of  "'inPackets1  for  future  values. 
The  predictive  algorithm  MUST  run  on  the  network 
node  and  MUST  be  immediately  available  asinput 
for  other  p red ictively  enhanced  objects.  The  pre¬ 
dicted  value  MUST  be  available  as  a  response  to 
SNMP  queries  for  future  state  information,  or  for 
transfer  to  other  nodes  via  virtual  messages, 
explained  later  in  this  document.  SNMP  Objects 
that  are  enhanced  with  predictive  capability  are 


Figure  156.  An  Example 
Case  Diagram. 
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loadPredictionTable. loadPredictionEntry. loadPredictionID.l  ->  OCTET  STRING-  (ascii) :AN-1 

loadPredictionTable. loadPredictionEntry. loadPredictionPort.l  ->  INTEGER:  3325 

loadPredictionTable. loadPredictionEntry. loadPredictionPredictedLoad. 4847  ->  INTEGER:  240 
loadPredictionTable. loadPredictionEntry . loadPredictionPredictedLoad. 20000  ->  INTEGER:  420 
loadPredictionTable. loadPredictionEntry . loadPredictionPredictedLoad. 40000  ->  INTEGER:  460 
loadPredictionTable. loadPredictionEntry . loadPredictionPredictedLoad. 60000  ->  INTEGER:  497 
loadPredictionTable. loadPredictionEntry. loadPredictionPredictedLoad. 80000  ->  INTEGER:  540 
loadPredictionTable. loadPredictionEntry. loadPredictionPredictedLoad. 100000  ->  INTEGER:  580 
loadPredictionTable . loadPredictionEntry . loadPredictionPredictedLoad. 120000  ->  INTEGER:  619 
loadPredictionTable. loadPredictionEntry . loadPredictionPredictedLoad. 140000  ->  INTEGER:  660 

loadPredictionTable . loadPredictionEntry . loadPredictionPredictedTime . 4847  ->  INTEGER:  4847 
loadPredictionTable . loadPredictionEntry . loadPredictionPredictedTime .20000  ->  INTEGER:  20000 
loadPredictionTable. loadPredictionEntry. loadPredictionPredictedTime. 40000  ->  INTEGER:  40000 
loadPredictionTable . loadPredictionEntry . loadPredictionPredictedTime . 60000  ->  INTEGER:  60000 
loadPredictionTable. loadPredictionEntry. loadPredictionPredictedTime. 80000  ->  INTEGER:  80000 
loadPredictionTable. loadPredictionEntry. loadPredictionPredictedTime. 100000  ->  INTEGER:  100000 
loadPredictionTable . loadPredictionEntry . loadPredictionPredictedTime .120000  ->  INTEGER:  120000 
loadPredictionTable . loadPredictionEntry . loadPredictionPredictedTime .140000  ->  INTEGER:  140000 

loadPredictionTable. loadPredictionEntry . loadPredictionCurrentLoad. 1  ->  INTEGER:  15949 
loadPredictionTable. loadPredictionEntry. loadPredictionCurrentTime . 1  ->  INTEGER:  25639 


Figure  155.  Outp 
utfromaQueryof 
the  MIB  Struc¬ 
ture  for  Handling 
Object  Values 
with  Predictive 
Capability. 


assumed  to  always  have  the  actual  monitored  value 
at  Wallclock  time. 

If  this  were  a  wireless  network,  a  more  realistic 
algorithmic  model  would  likely  incorporate  channel 
qualitySNMP  Objects  into  the  "encodingErrors" 
prediction  algorithm.  In  many  cases,  the  algorithmic 
portion  of  the  Case  Diagram  will  involve  SNMP 
objects  from  other  nodes.  Syntax  should  include  the 
abilityto  identify  general  topological  information  in 
the  description  of  external  objects.  For  example, 
"inPackets[ adj] "  or  "in Packets! edge]"  should  indi¬ 
cate  immediately  adjacent  nodes  or  nodes  at  the 
topological  edge  of  the  network. 

In  the  example  shown  in  Figure  158,  a'packets- 


A  Upper  Layer 
I 

==+==  driverPackets 
I 

I 

+==>  driverForwarded+ 

I  {  delta  *  (appPackets (t-epsilon)  -  appPackets (t) ) /  epsilon  } 

I 

==+==  inPackets 
|  Lower  Layer 

Figure  159.  A  Node  Generating  State  Information  Used  by  the 
Node  in  Figure  158. 

algorithm  predicts  "driverForwarded"  packets  to  be 
a  linear  approximation  of  a  sample  of  "appPackets". 
The  sample  is  "epsilon"  time  units  apart  and  the  pre¬ 
diction  is  "delta"  time  units  into  the  future. 


A  Upper  Layer 
I 

==+==  outPackets 


+==>  packetsForwarded-t-  {  driverForwarded]  edge] 


==+==  inPackets 
I  Lower  Layer 


Figure  158.  An  Al¬ 
gorithmic  Descrip¬ 
tion  Using  State 
Generated  from  An¬ 
other  Node  De¬ 
scribed  in 
Figure  159. 


Forwarded'  object  has  predictive  capability  denoted 
by  the'-H  symbol.  The  predictive  capability  comes 
from  an  algorithmic  model  specified  within  the 
braces  next  to  the  object  name.  In  this  case,  the  pre¬ 
diction  will  be  the  value  of  the  "driverForwarded" 
object  from  the  node  closest  to  the  edge  of  the  net¬ 
work. 

I  n  Figure  159, which  is  an  SN  M  P  diagram  of  the 
edge  node,  the  "driverForwarded"  object  is  pre¬ 
dicted  by  executing  the  algorithm  in  braces.  This 


B.5.  MULTI-PARTY  MODEL  INTERACTION 

Multiple  developers  and  administrators  of  in-line 
predictive  algorithmic  models  will  require  mecha¬ 
nisms  to  ensure  correct  understanding  and  opera¬ 
tion  of  each  others'  models  and  intentions. 

Model  Registration 

It  maybe  necessary  to  register  predictive  models. 
Registration  isoften  an  IANA  function  [6],  Algorith¬ 
mic  model  registration  needs  to  be  handled  more 
dynamically  than  AgentX  models.  Algorithmic  mod¬ 
els,  while  not  necessary  doing  so,  have  the  capability 
to  install/  deinstall  at  rapid  rates.  The  in-line  model 
installation  and  deinstallation  proposed  standard  is 
described  in  Section  7. 

Model  Interaction 

Multiple  models  residing  on  a  node  need  to  inter¬ 
operate  with  one  another.  This  document  proposes 
to  use  SNMP  Object  Identifiers  as  much  as  possible 
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for  communication  of  state  information  among 
models.  In  addition,  multiple  Active  Application 
models  may  choose  to  communicate  with  one 
another  via  global  state. 

Co-existence  with  Legacy  SNMP 

Querying  an  IP  addressable  node  for  SNMP  objects 
that  are  p red ictively  enhanced  should  appear  trans¬ 
parent  to  the  person  polling  the  node.  M  ultiple 
ports,  etc.  should  not  be  required.  A  program 
injected  into  a  node  that  serves  to  extend  an  SNMP 
MIB  MAY  do  so  using  global  state.  A  global  state 
cache  holdsthe  SN  M  P  object  values  and  responds 
via  an  internal  port  to  connect  with  a  master  SNMP 
agent  for  the  node. 

B.6.  A  COM  MON  PREDICTIVE  FRAM  EWORK 

Thissection  specifies  an  algorithmic  predictive  man¬ 
agement  framework.  The  framework  allows  details 
of  distributed  simulation,  such  as  time  management, 
state  saving,  and  model  development  to  be  imple¬ 
mentation  dependent  while  ensuring  in-line  inter¬ 
operability  both  with,  and  within,  the  network.  The 
general  predictive  network  management  architec¬ 
ture  MUST  contain  at  least  one  Driving  Processes 
(DP),  MAY  contain  Logical  Processes  (LP),  and 
MUST  use  Virtual  Messages  (VM). 

Figure  160  illustrates  network  nodes  containing 
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I  (msg) |  /+ - + 
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Figure  160.  Framework  Entity  Types. 


DPsand  LPs.  The  annotation  under  nodesAH-1  and 
AN-1  are  an  SNM  P  Object  Identifier.  SNM  P  Object 
Identifier'oid  l'  represents  state  of  node  AH  -1.  The 
p  red  ictively  enhanced  SNM  P  Object  Identifier, 
"oid+"on  node  AN -1  is  a  function  of'oid  l'.  Note 
that  "f()"isshown  as  an  arbitrary  function  in  the  fig¬ 
ure,  but  MUST  be  well-defined  in  practice. 

The  framework  makes  a  distinction  between  a 
Physical  Process  and  a  Logical  Process.  A  Physical 
Process  is  nothing  more  than  an  executable  task 
defined  by  program  code  i.e.  it  isthe  implementa¬ 
tion  of  a  particular  model  or  a  hardware  component 
or  a  direct  connection  to  a  hardware  component 


representing  a  device.  An  example  of  a  Physical  Pro¬ 
cess  isthe  packet  forwarding  process  on  a  router. 
Each  Physical  Process  MU  ST  be  encapsulated  within 
a  Logical  Process,  labeled  LP  in  Figure  160.  A  Logi¬ 
cal  Process  consists  of  a  Physical  Process,  or  a  model 
of  the  Physical  Process  and  additional  implementa¬ 
tion  specific  data  structures  and  instructions  to 
maintain  message  order  and  correct  operation  as 
the  system  executes  ahead  of 

current  (or  Wallclock)  time  as  illustrated  in 
greater  detail  in  Figure  160.  The  details  of  the  DP 
and  LP  structure  and  operation  are  implementation 
specific,  while  the  inter-operation  of  the  DP/  LP  sys¬ 
tem  must  be  specified.  The  LP  architecture  is 
abstracted  in  Figure  161.  The  flow  of  messages 
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Figure  161.  A  High-level  View  of  the  Logical  Process  Frame¬ 
work  Component  within  an  Active  Application. 


through  the  LP  isshown  by  the  arrowsentering 
from  the  leftside  of  the  figure.  The  in-line  predic¬ 
tive  framework  components  are  shown  in 
Figure  160,  where  AH  -1  and  AN  -1  are  Active  H  ost  1 
and  Active  Node  1  respectively.  In  this  con  text, 
active  hosts  are  nodes  that  can  inject  new  packets 
into  the  network  while  active  nodes  are  nodes  that 
behave  as  intermediate  hops  in  a  network. 

The  Logical  Process  MU  ST  handle  time  manage¬ 
ment  for  the  model.  The  Logical  Process  and  the 
model  that  it  implements  MAY  be  implemented  in 
any  manner,  however,  they  must  be  capable  of  inter¬ 
operating.  The  framework  M  U  ST  be  capable  of  sup¬ 
porting  both  conservative  and  optimistic  time  man¬ 
agement  within  the  network.  Conservative  time 
management  REQUIRES  that  the  model  block  when 
messages  MAY  be  received  out-of-order  while  opti¬ 
mistic  time  management  MAY  allow  model  process¬ 
ing  to  continue,  even  when  messages  are  received 
out-of-order.  However,  additional  implementation 
specific  mechanisms  MAY  be  used  to  account  for 
out-of-order  messages.  Such  mechanisms  MAY  be 
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embedded  within  the  Logical  Process  and  thisspeci- 
fication  does  not  attempt  to  standardize  them. 

Virtual  input  messages  directed  to  a  Logical  Pro¬ 
cess  M  U  ST  be  received  by  the  Logical  Process, 
passed  to  the  model,  and  processed.  Virtual  output 
messages  MAY  be  generated  as  a  result. 

Virtual  messages  contain  the  following  fields: 

•  Send  Time  (TS)  which  MUST  contain  the  LVT 
(local  simulation  time)  at  which  the  message 
was  sent 

•  Receive  Time  (TR)  which  MUST  denote  the 
time  the  message  is  expected  to  exist  in  the 
future 

•  MAY  contain  an  (optional)  Anti-toggle  (A)  bit 
for  out-of-order  message  handling  purposes 
such  as  message  cancellation  and  rollback 

•  MUST  contain  the  message  content  itself  (M) 
which  is  model  specific 

Thus,  a  Virtual  Message  (VM)  MUST  have  the 
following  structure... 

0  12  3 

01234567890123456789012345678901 


Send-Time  (TS) 
Receive-Time  (RT) 
Real-Time  (TR) 
A| 

Message  (M) 


Figure  162.  An  In-line  Management  Prediction  Virtual  Mes¬ 
sage. 

These  in-line  predictive  messages,  or  virtual  mes¬ 
sages,  that  contain  invalid  fields  because  the  trans¬ 
mitting  Logical  Processes  used  an  incompatible 
time  management  technique  MUST  be  dropped. 

H  owever,  it  is  SUGGESTED  that  a  count  of  such 
packets  be  maintained  in  a  general  in-line  predictive 
management  framework  MIB.  The  Receive  Time 
field  MUST  be  filled  with  the  time  that  this  message 
is  predicted  to  be  valid  at  the  destination  Logical 
Process.  The  Send  Time  field  MUST  be  filled  with 
the  time  that  thismessage  was  sent  by  the  originat¬ 
ing 

Logical  Process.  The  Anti-Toggle  (A)  field  MUST 
be  used  for  creating  an  anti -message  to  remove  the 
effects  of  false  messages  as  described  later.  A  mes¬ 
sage  M  UST  also  contain  a  field  for  the  current  Real 
Time  (RT).  If  a  message  arrives  at  a  Logical  Process 
out-of-order  or  with  invalid  information,  that  is,  out 


of  a  pre-specified  tolerance  for  prediction  accuracy, 
it  is  called  a  false  message.  The  method  for  handling 
false  messages  is  implementation  specific.  The 
Receive  Queue,  shown  in  Figure  163,  maintains 
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Figure  163.  A  Logical  Process  Implementation  and  Interface. 


newly  arriving  messages  in  order  by  Receive  Time 
(TR).  The  implementation  of  the  Receive  Queue  is 
implementation  specific. 

The  Driving  and  Logical  Processes  MU  ST  com¬ 
municate  via  virtual  messages  as  shown  in 
Figure  164.  The  Driving  Process  MAY  generate  pre¬ 
dictions  based  upon  SNMP  queries  of  other  layers 
on  the  local  node.  The  Logical  Process  MAY  check 
its  prediction  accuracy  via  SNMP  queries  of  other 
layers  on  its  local  node. 


DP 

-+/— + 
l\ - 1 1  SNMP 

1 

LP 

-+/— + 
l\ -  1  1  SNMP 

Virtual  Messages 

1  1  1 

1 

Virtual  Messages 
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1 _ 1  1 

1 

ANEP 

1 _ 1  1 

IP 

1 _ 1  1 

IP 

1 _ 1  1 

Driving  Process 

Logical  Process 

Figure  164.  Facility  for  Checking  Accuracy  w  ith  Actual  Net¬ 
work  SNMP  Objects  in  the  In-line  Predictive  Management 
Framework. 

The  in-line  predictive  framework  MAY  allow  for 
prediction  refinement  and  correction  by  communi¬ 
cating  with  the  actual  component  whose  state  isto 
be  predicted  via  an  SNM  P  query.  The  asynchronous 
prediction  mechanism  has  the  following  architec¬ 
ture  for  Logical  Process  (Figure  163). 

All  of  the  Logical  Process  queues  and  cachesMAY 
reside  in  an  active  node's  Small-State.  Small-State  isa 
persistent  memory  cache  left  behind  by  an  active 
packet  that  is  available  to  trailing  active  packets  that 
have  the  proper  access  rights.  Typically,  any  type  of 
information  can  be  stored  in  Small-State. 

The  Receive  Queue  MAY  maintain  active  virtual 
message  ordering  and  scheduling.  All  active  packets 
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MUST  be  encapsulated  inside  Active  Packets follow- 
ing  the  Active  Network  Encapsulation  Protocol  [7] 
format.  Once  a  virtual  message  leaves  the  Receive 
Queue,  the  virtual  time  of  the  Logical  Process, 
known  as  Local  Virtual  Time,  MUST  be  updated  to 
the  value  of  the  Receive  Time  from  the  departing 
virtual  message.  Virtual  messagesMUST  originate 
from  Driving  Processes,  shown  in  Figure  160  that 
predict  future  events  and  inject  them  intothesystem 
as  virtual  messages.  The  development  of  a  Driving 
Process  and  Logical  Process  are  dependent  upon 
the  model  used  to  enhance  the  desired  state  of  the 
system  with  predictive  capability.  Logical  Processes 
MUST  only  operate  upon  the  the  arrival  of  virtual 
input  messages  and  MUST  NEVER  spontaneously 
generate  virtual  messages. 

Following  the  arrows  across  Figure  163,  virtual 
messages  enter  eitherthe  Physical  Process.  The  state 
of  the  Logical  Process  is  periodically  saved  in  the 
State  Queue  (SQ)  shown  as  the  State  Cache  in 
Figure  163.  State  Queue  values  are  used  to  restore 
the  Logical  Process  to  a  known  safe  state  when  false 
messages  are  received.  State  values  are  continuously 
compared  with  actual  values  from  the  Physical  Pro¬ 
cess  to  check  for  prediction  accuracy,  which  in  the 
case  of  load  prediction  isthe  number  and  arrival 
times  of  predicted  and  actual  packets  received.  If  the 
prediction  error  exceeds  a  specified  tolerance,  a 
rollback  MAY  occur. 

An  important  part  of  the  architecture  for  net¬ 
work  management  isthe  fact  that  the  State  Queue 
within  the  in-line  management  prediction  architec¬ 
ture  isthe  node's  Management  Information  Base. 
The  State  Queue  values  are  the  SN  M  P  M  anagement 
Information  Base  Object  values;  but  unlike  legacy 
SNMP  values,  these  values  are  expected  to  occur  in 
the  future.  The  State  Queue  operation  is  implemen¬ 
tation  dependent,  however,  it  holds  the  predicted 
SNMP  Objects,  isSUGGESTED  to  be  implemented 
in  small-state,  and  MUST  use  the  interface  specified 
in  Section  7.2  to  respond  to  SNMP  queries.  The  cur¬ 
rent  version  of  SNMP  has  no  mechanism  to  indicate 
that  a  managed  object  is  reporting  its  future  state; 
currently  all  results  are  reported  with  atimestamp 
that  containsthe  current  time.  In  working  on  pre¬ 
dictive  active  network  management  prediction  there 
is  a  need  for  managed  entities  to  report  their  state 
information  at  times  in  the  future.  These  times  are 
unknown  to  the  requester.  A  simple  means  to 
request  and  respond  with  future  time  information  is 
to  append  the  future  time  to  all  Management  Infor¬ 


mation  Base  Object  Identifiers  that  are  predicted. 
This  requires  making  these  objects  members  of  a 
Management  Information  Base  table  indexed  by 
predicted  time  asdiscussed  in  Section  2.  Thiscan  be 
seen  in  the  load  Prediction  Table  shown  in 
Figure  154.  Thus  a  Simple  Network  Management 
Protocol  client,  who  doesnot knowtheexacttimeof 
the  next  predicted  value,  can  issue  a  get-  next  com¬ 
mand  appending  the  current  time  to  the  known 
object  identifier.  The  managed  object  responds  with 
the  requested  object  valid  at  the  closest  future  time. 
The  figure  illustrates  an  SNMP  request  and  the  cor¬ 
responding  response. 

Future  times  are  the  LVT  of  the  Logical  Process 
running  on  a  particular  node.  AsWallclock 
approaches  a  particular  future  time,  predicted  val¬ 
ues  MAY  be  adjusted,  allowing  the  prediction  to 
become  more  accurate.  The  table  of  future  values 
MAY  be  maintained  within  a  sliding  Lookahead  win¬ 
dow,  so  that  old  values  are  removed  and  the  predic¬ 
tion  does  exceed  a  given  future  time.  Continuing 
along  the  arrows  in  Figure  161,  any  virtual  messages 
that  are  generated  as  a  result  of  the  Physical  Process 
or  model  computation  proceed  to  the  Send  Queue 
(QS). 

The  Send  Queue  is  implementation  dependent, 
however,  it  MAY  maintain  copies  of  virtual  messages 
to  be  transmitted  in  order  of  their  send  times.  The 
Send  Queue  is  required  for  the  generation  of  anti¬ 
messages  during  rollback.  Anti-Messages  annihilate 
corresponding  virtual  messages  when  they  meet  to 
correct  for  previously  sent  false  messages.  Annihila¬ 
tion  is  simply  the  removal  of  both  the  actual  and  the 
anti-message.  Where  the  annihilation  occurs  is 
implementation  specific  and  left  to  the  implemen¬ 
tor.  After  leaving  the  Send  Queue,  virtual  messages 
travel  to  their  destination  Logical  Process.  Further 
detailson  the  optimistic  synchronization  mechanism 
are  implementation  dependent  and  outside  the 
scope  of  this  work  in  progress. 

B.7.  SUM  MARY  OF  IN-LINE  PREDICTION 
REQUIREMENTS 

An  in-line  management  prediction  model  developer 
MUST  implement  at  least  one  Driving  Processing 
and  MAY  implement  a  Logical  Process  using  the 
same  time  management  technique.  The  model 
developer  MAY  include  an  SNMP  client  within  the 
model  in  order  to  querythe  modeled  component  in 
order  to  improve  prediction  accuracy.  The  model 
developer's  Driving  Process  MU  ST  generate  virtual 
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messages.  The  Logical  Process  MU  ST  receive  and 
process  those  messages.  The  Logical  ProcessMAY 
respond  to  virtual  messages  by  generating  virtual 
message(s).  The  Logical  Process  MAY  use  active  net¬ 
work  node  Small- state  to  hold  a  time  series  of  the 
SNMP  Object  Id  whose  value  is  being  continuously 
predicted.  The  interface  to  the  SNMP  MIB  small- 
state  is  specified  in  the  following  section. 

B.8.  DETAILS  OFTHE  ACTIVE  NETWORK 
INTERFACE 

The  general  active  network  architectural  framework, 
without  any  specific  network  management  paradigm 
implementation,  is  shown  in  Figure  165. 


Active  Applications  + - +  + - +  + - +  + - + 

|AA  1|  | AA  2|  | AA  3|  I AA  4| 
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In-line  network  management  prediction  requires 
a  general  active  network  framework  that  supports 
active  applicationsto  be  injected  into  the  proper 
execution  environments.  The  in-line  management 
prediction  framework  enforces  certain  minimal 
requirementson  the  execution  environment,  which 
are  listed  below. 

Information  Caches 

The  execution  environment  MU  ST  provide  an  infor¬ 
mation  cache  called  'Small  State'  asdefined  in  Sec¬ 
tion  1.3  to  enable  information  exchange  between 
active  packets,  defined  in  Section  1.3.  The  execution 
environment  MAY  also  provide  an  information 
cache  called  'Global  State1,  defined  in  Section  1.3,  to 
enable  the  in-line  management  prediction  frame¬ 
work  to  communicate  with  a  predictively  managed 
active  application  to  query  its  current  state.  The  EE 
M  U  ST  provide  an  API  to  be  able  to  store  and  query 
both  'Small  State1  and  also  to  'Global  State',  if  it  is 
implemented.  The  EE  SH  OU  LD  provide  appropri¬ 
ate  access  control  mechanisms  to  both  'Small  State1 
and  also  to  'Global  State',  if  it  is  implemented. 

Interface  to  SNMP 

The  execution  environment  MU  ST  provide  an  inter¬ 
face  that  enables  both  the  in-line  management  pre¬ 
diction  values  and  the  values  of  the  actual 
component  being  managed  to  publish  their  state  to 
an  SNMP  MIB.Thisenablesthein-linemanagement 


Figure  165.  The  Ac¬ 
tive  Network 
Framework. 


prediction  framework  to  store  the  predicted  state  in 
a  well-known  format  and  also  enables  legacy  SNMP 
toolsto  query  the  predicted  state  using  SNMP  opera¬ 
tions.  Additionally,  the  managed  application  is  also 
ableto  update  its  current  state  using  SNMP,  which 
the  Logical  Process  will  be  able  to  query.  In  a  partic¬ 
ular  implementation  of  such  an  interface,  a  generic 
SNMP  agent  coded  as  an  active  application  MAY  be 
injected  into  the  active  nodes.  The  agent  creates  a 
'Global  State'  on  the  active  node  with  a  well-known 
name.  The  agent  reads  information  coded  in  a 
known  format  that  has  been  written  to  the  'Global 
State'  and  publishes  it  to  the  MIB.  Anyactive  applica¬ 
tion  that  wishes  to  advertise  its  state  uses  an  interface 
that  enables  it  to  store  its  information  in  the  well- 
known  'Global  State'  in  the  given  format. 

The  format  of  the  messages  that  are  posted 
between  the  SNMP  agent  and  an  active  application 
are  shown  in  Figure  166. 


Message  Type  |  Object  ID 


Value 


Figure  166.  Message  Packet. 


The  SNMP  Agent  and  the  active  application  MAY 
use  special  interfaces  to  implement  messaging 
between  them.  A  Message  Packet,  whose  format  is 
shown  in  Figure  166,  isthe  basic  unit  of  inter-appli¬ 
cation  communication.  Each  message  consists  of  a 
message  type.  The  type  SH  0  U  LD  assume  one  of  the 
following  values: 

•  MSG_ADDINT :  to  add  a  new  MIB  Object  of 
type  SNMP  INTEGER 

•  MSG_UPDATEINT:  to  update  the  value  of  an 
MIB  Object  of  type  SNMP  INTEGER 

•  MSG_GETINT:togetthevalueof  an  MIB 
Object  of  type  SNMP  INTEGER 

•  MSG_ADDLONG:  to  add  a  new  MIB  Object  of 
type  SNMP  LONG 

•  MSG_UPDATELONG:  to  update  the  value  of 
an  MIB  0  bject  of  type  SN  M  P  LONG 

•  MSG_GETLONG:togetthevalueof  an  MIB 
Object  of  type  SNMP  LONG 

•  MSG_ADDSTRING:  to  add  a  newMIB  Object 
of  type  SNMP  STRING 

•  MSG_GETSTRING:  to  getthe  value  of  an  MIB 
Object  of  type  SNMP  STRING 

•  MSG_UPDATESTRING:  to  update  the  value  of 
an  MIB  Object  of  type  SNMP  STRING 
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The  active  application  SHOULD  send  a  mes¬ 
sage  of  the  valid  message  type  to  the  SNMP  agent  to 
perform  the  required  operation.  On  receipt  ofa 
message,  th  e  SN  M  P  agen  t  SH  0  U  L  D  attem  pt  to  per¬ 
form  the  requested  operation.  It  MUST  then 
respond  with  an  acknowledgment  message  in  a  for¬ 
mat  shown  in  Figure  167. 

The  acknowledgment  message  has  the  follow¬ 
ing  format. 


I  Status  Code  | 

I 


Figure  167.  Acknowledgment  Message  Packet. 

The  status  code  M  UST  have  one  of  the  following 
values: 

•  OK:  to  indicate  successful  operation 

•  ERRDU  PENTRY:  if  for  a  MSG_ADD  opera¬ 
tion,  an  Object  identifier  of  given  name  already 
exists 

•  ERR_N  0  SU  C  H I D :  if  for  a  M  SG_U  PD  AT  E  oper¬ 
ation,  an  0  bject  identifier  of  given  name  does 
not  exist. 

The  Statusmessage  MAY  be  any  descriptive 
string  explaining  the  nature  of  the  failure  or 
SHOULD  be  "Success”  for  a  successful  operation. 

B.9.  IMPLEMENTATION 

Models  injected  into  the  network  allownetwork  state 
to  be  predicted  and  efficiently  propagated  through¬ 
out  the  active  network  enabling  the  network  to  oper¬ 
ate  simultaneously  in  real  time  as  well  asprojectthe 
future  state  of  the  network.  Network  state  informa¬ 
tion,  such  as  load,  capacity,  security,  mobility,  faults, 
and  other  state  information  with  supporting  mod¬ 
els,  is  automatically  available  for  use  bythe  manage¬ 
ment  system  with  current  values  and  with  values 
expected  to  exist  in  the  future.  In  the  current  ver¬ 
sion,  sample  load  and  processor  usage  prediction 
applications  have  been  experimentally  validated 
using  the  AtroposToolkit  [11],  The  tool  kit's  distri  b- 
uted  simulation  infrastructure  takes  advantage  of 
parallel  processing  within  the  network,  because  com¬ 
putation  occurs  concurrently  at  all  participating 
active  nodes.  The  network  being  emulated  can  be 
queried  in  real  time  to  verify  the  prediction  accu¬ 
racy.  Measures  such  as  rollbacks  are  taken  to  keep 
the  simulation  in  line  with  actual  performance. 


Predictive  In-line  Management  Information 
Base 

Further  detailson  the  in-line  network  management 
prediction  concept  can  be  found  in  Active  Networks 
and  Active  Network  Management  [1],  The  SNM P 
M  IB  for  the  in-line  predictive  management  system 
described  in  this  proposed  standard  follows  in  the 
next  section. 

Figure  168.  The  Atropos  M  IB.  (Printouts  appear  on  the 
following  pages.) 

B.10.SECURITY  CONSIDERATIONS 

Clearly,  the  power  and  flexibility  to  increase  perfor¬ 
mance  via  the  ability  to  inject  algorithmic  informa¬ 
tion  also  has  security  implications.  Fundamental 
active  network  framework  security  implications  will 
be  discussed  in  [10]. 
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ATROPOS-MIB  DEFINITIONS  : :=  BEGIN 
IMPORTS 

MODULE-IDENTITY,  OBJECT-TYPE,  experimental. 

Counter 32,  TineTicks 
FROM  SNMPV2-SMI 
Displaystring 
FROM  SNMPV2-TC; 

atropOSMIB  MODULE-IDENTITY 
LAST-UPDATED  "9801010000Z" 

ORGANIZATION  "GE  CRD" 

CONTACT-INFO 

"Stephen  F.  Bush  bushsf0crd.ge.com" 

DESCRIPTION 

"Experimental  MIB  modules  for  the  Active  Virtual  Network 
Management  Prediction  (Atropos)  system." 

{  experimental  active(75)  4  } 


—  Logical  Process  Table 


IP  OBJECT  IDENTIFIER  {  atroposMIB  1  } 

IPTable  OBJECT-TYPE 
SYNTAX  SEQUENCE  OF  LPEr.try 
MAX-ACCESS  not-accessible 
STATUS  current 
DESCRIPTION 

"Table  of  Atropos  LP  information." 

::=  {  IP  1  } 

IPE.ntry  OBJECT-TYPE 
SYNTAX  LPEr.try 
MAX-ACCESS  not-accessible 
STATUS  current 
DESCRIPTION 

"Table  of  Atropos  LP  information." 

INDEX  {  lPIndex  } 

: :=  {  IPTable  1  } 


LPEr.try  :  :  = 

IPIr.dex 

1PID 

1PLVT 

lPQRSize 

lPQSSize 


SEQUENCE  { 

INTEGER, 
Displaystring, 
INTEGER, 
INTEGER, 
INTEGER, 


IPCausalityRollbacks  INTEGER, 
IPToierar.ceRoilbacks  INTEGER, 


IPSQSize 

lPToierar.ee 

1PGVT 

IPLookAhead 

lPGvtUpdate 

IPStepSize 

IPReal 

IPVirtual 

IPNuraPkts 

IPNumAr.ti 

IPPredAcc 

IPPropX 

IPPropY 

IPETask 

IPETrb 

IPVraRate 

IPReRate 

IPSpeedup 

IPLookahead 

IPNumNoState 

IPStatePred 

IPPktPred 

IPTdiff 

IPStateError 

IPUptirae 

) 


INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

INTEGER, 

Displaystring, 

Displaystring, 

Displaystring, 

Displaystring, 

Displaystring, 

DisplayStrir.g, 

DisplayStrir.g, 

DisplayStrir.g, 

DisplayStrir.g, 

INTEGER, 

Displaystring, 

Displaystring, 

DisplayStrir.g, 

Displaystring, 

TimeTicks 


lPIndex  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  not-accessible 

STATUS  current 

DESCRIPTION 

"The  LP  table  index." 

: {  IPEntry  1  ) 

1PID  OBJECT-TYPE 

SYNTAX  Displaystring 

MAX-ACCESS  read-only 

STATUS  current 

DESCRIPTION 

"The  LP  identifier." 

::=  {  IPEntry  2  ) 

1PLVT  OBJECT-TYPE 

SYNTAX  INTEGER  (0 .. 2147483647) 


MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  LP  Local  Virtual  Time." 

::=  {  IPEntry  3  ) 

lPQRSize  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  LP  Receive  Queue  Size." 

: {  IPEntry  4  ) 

lPQSSize  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  LP  send  queue  size." 

: :=  {  IPEntry  5  ) 

IPCausalityRollbacks  OBJECT-TYPE 
SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  number  of  rollbacks  this  LP  has  suffered." 
: :=  {  IPEntry  6  ) 

IPToierar.ceRoilbacks  OBJECT-TYPE 
SYNTAX  INTEGER  (C  .  .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  number  of  rollbacks  this  LP  has  suffered." 
: {  IPEntry  7  ) 

IPSQSize  OBJECT-TYPE 

SYNTAX  INTEGER  (0 .. 2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  LP  state  queue  size." 

: :=  {  IPEntry  8  ) 

lPTolerance  OBJECT-TYPE 
SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  allowable  deviation  between  process's 
predicted  state  and  the  actual  state." 

: :=  {  IPEntry  9  ) 

1PGVT  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  this  system's  notion  of  Global  Virtual  Time." 

: :=  {  IPEntry  10  } 

lPLookAhead  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  this  system's  maximum  time  into  which  it  can 
predict. " 

: :=  {  IPEntry  11  ) 

lPGvtUpdate  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .  .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  GVT  update  rate." 

: :=  {  IPEntry  12  ) 

IPStepSize  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  lookahead  (Delta)  in  milliseconds  for  each 
virtual  message  as  generated  from  the  driving  process." 
::=  {  IPEntry  13  ) 

IPReal  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  total  number  of  real  messages  received." 

: :=  {  IPEntry  14  } 
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IPVirtual  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  total  number  of  virtual  messages 
received. " 

::=  {  IPEntry  15  ) 

lPNuraPkts  OBJECT-TYPE 
SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  total  number  of  all  Atropos  packets 
received. " 

: :=  {  IPEntry  16  } 

IPNumAnti  OBJECT-TYPE 
SYNTAX  INTEGER  (0 .. 2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  total  number  of  Anti-Messages  transmitted 
by  this  Logical  Process." 

::=  {  IPEntry  17  ) 

IPPredAcc  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  prediction  accuracy  based  upon  time 

weighted  average  of  the  difference  between  predicted  and  real 

values . " 

: :=  {  IPEntry  18  } 

IPPropX  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  proportion  of  out-of-order  messages 
received  at  this  Logical  Process." 

: :=  {  IPEntry  19  ) 

IPPropY  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  proportion  of  out-of-tolerance  messages 
received  at  this  Logical  Process." 

: :=  {  IPEntry  20  ) 

IPETask  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  expected  task  execution  wallclock  time  for  this 
Logical  Process." 

: :=  {  IPEntry  21  ) 

IPETrb  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  expected  wallclock  time  spent  performing  a 
rollback  for  this  Logical  Process." 

: :=  {  IPEntry  22  ) 

IPVraRate  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  rate  at  which  virtual  messages  were 
processed  by  this  Logical  Process." 

: :=  {  IPEntry  23  ) 

IP Re Rate  object-type 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  time  until  next  virtual  message." 

: :=  {  IPEntry  24  ) 

IPSpeedup  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  speedup,  ratio  of  virtual  time  to  wallclock  time. 


of  this  logical  process." 

: :=  {  IPEntry  25  ) 

IPLookahead  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  expected  lookahead  in  milliseconds  of  this 
Logical  Process." 

: :=  {  IPEntry  26  ) 

IPNumNoState  OBJECT-TYPE 
SYNTAX  INTEGER  (0 . .2147483647) 

MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  number  of  times  there  was  no  valid  state  to 
restore  when  needed  by  a  rollback  or  when  required  to  check 
prediction  accuracy." 

::=  {  IPEntry  27  ) 

IPStatePred  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  cached  value  of  the  state  at  the  nearest 
time  to  the  current  time." 

: {  IPEntry  28  ) 

IPPktPred  OBJECT-TYPE 
SYNTAX  DispiayString 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  predicted  value  in  a  virtual  message." 

: {  IPEntry  29  } 

IPTdiff  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  time  difference  between  a  predicted  and  an 
actual  value." 

::=  {  IPEntry  30  ) 

IPStateError  OBJECT-TYPE 
SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  difference  between  the  contents  of  an  application 
value  and  the  state  value  as  seen  within  the  virtual  message." 
::=  {  IPEntry  31  ) 

lPUptirae  OBJECT-TYPE 

SYNTAX  INTEGER  (0 . .2147483647) 

—SYNTAX  Displaystring 
MAX-ACCESS  read-only 
STATUS  current 
DESCRIPTION 

"This  is  the  time  in  milliseconds  that  Atropos  has  been 
running  on  this  node." 

{  IPEntry  32  ) 

END 
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Glossary  and  Definitions 


Algorithmic  Sufficient  Statis¬ 
tic 

Complexity  Theory 


Computational  Complexity 

Computational  Mechanics 

Computational  Complexity 
Entropy  Rate 

Inductive  Inference 

Information  Assurance 


Kolmogorov  Complexity 


Minimum  Description  Length 
(MDL) 

Minimum  Message  Length 
(MML) 

Minimum  Sufficient  Statistic 


Prefix  Code 


Prefix  Free  Program  Set 


The  shortest  program  S*  that  computes  a  finite  set  S  containing  d  on  a  universal  com¬ 
puter,  such  that  the  two-part  description  consisting  of  S_  and  logISI  is  as  short  as  the 
shortest  single  program  that  computes  d  without  input  [172]. 

A  term  used  to  describe  a  breadth  of  disciplines  engaged  in  the  study  of  what  makes 
something  hard  and  what  makes  something  easy.  The  sense  in  which  something  is 
hard  or  easy  separates  the  varieties  of  complexity  theory,  Kolmogorov  complexity 
considers  minimal  descriptions  less  complex  than  long  descriptions.  Computational 
complexity,  on  the  other  hand,  considers  a  problem  hard  if  it  requires  a  long  time  or  a 
lot  of  space  on  a  Turing  machine  in  order  to  be  solved  [211]. 

For  input  of  length  n,  if  Turing  machine  T  makes  at  mostt(n)  moves  before  it  stop  then 
T  is  said  to  run  in  time  t(n)  and  have  time  complexity  t(n).  If  T  uses  at  most  s(n)  tape 
cells  in  the  same  computation  it  is  said  to  use  s(n)  space  and  have  space  complexity 
s(n)  [211]. 

Refers  to  the  structure  of  a  process,  in  a  class  of  complexity  theory  sometimes  called 
"structural  complexity  theory,"  of  which  computational  mechanics  is  the  most  devel¬ 
oped  [212]. 

The  amount  of  time  or  memory  required  to  solve  a  given  problem.  [211]. 

With  respect  to  a  stochastic  process,  the  entropy  rate  is  the  rate  with  which  the 
entropy  of  a  sequence  of  n  random  variables  grows  within  [118]. 

The  process  of  reaching  a  general  conclusion  from  specific  examples,  including  con¬ 
clusions  about  examples  not  specified.  Generalization,  or  reasoning  from  the  specific 
to  the  General  Case.  [213] 

Information  operations  (10)  that  protect  and  defend  information  and  information  sys¬ 
tems  (IS)  by  ensuring  their  availability,  integrity,  authentication,  confidentiality,  and 
non-repudiation.  This  includes  providing  for  restoration  of  information  systems  by 
incorporating  protection,  detection,  and  reaction  capabilities.  Alternatively,  Informa¬ 
tion  operations  (10)  that  protects  and  defend  information  and  information  systems  (IS) 
by  ensuring  their  availability,  integrity,  authentication,  confidentiality,  and  non-repudi¬ 
ation.  This  includes  providing  for  restoration  of  information  systems  by  incorporating 
protection,  detection,  and  reaction  capabilities.  [214] 

The  length  of  the  smallest  program  capable  of  generating  a  given  string  without  input 
on  a  Universal  Turing  Machine.  Sometimes  referred  to  as  descriptive  complexity  or 
Kolmgorov-Chaitin  complexity  [10]. 

Criteria  for  inductive  inference  [215]. 

Criteria  for  inductive  inference  [14]. 

A  statistic  that  is  a  function  of  all  other  statistics  and  contains  no  additional  irrelevant 
information  [118]. 

A  code  in  which  no  code  word  is  the  prefix  of  another  codeword  such  that  the  it  can 
be  instantaneously  decoded  [118]. 

A  set  of  programs  such  that  no  program  leading  to  a  halting  computation  is  the  prefix 
of  another  program  [10]. 
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Partial  Recursive  Functions 

The  set  of  functions  mapping  strings  in  the  set  {0, 1}*  to  the  finite  set  {0,1}*  or 
infinite  set{0, 1}oo  that  is  computable  by  a  Turing  Machine  [10]. 

Recursive 

A  Turing  machine  that  implements  a  function  mapping  an  input  to  output  and  halting 
on  all  inputs  is  known  as  recursive.  All  recursive  functions  are  computable  [10]. 

Sophistication 

The  minimal  length  of  a  total  recursive  function  that  leads  to  an  optimal  two-part  code 
for  a  given  object  (binary  string).  One  part  is  the  model  that  comprises  the  structure  or 
patterns  in  the  string.  The  second  part  consists  of  random  data  that  identifies  the  spe¬ 
cific  object  within  the  set  defined  by  the  model.  The  minimum  sufficient  statistic  in  the 
recursive  model  class  [181]. 

Statistic 

Sufficient  Statistic 

A  function  of  a  sample  of  data  [118]. 

A  statistic  of  a  distribution  that  contains  all  the  information  in  a  sample  about  the  dis¬ 
tribution  [118]. 

Two-part  Codes/  Two-part 
Description 

A  description  of  a  binary  string  object  consisting  of  two  separate  parts.  The  first  part 
consists  of  a  description  of  a  model  or  set  comprising  the  compressible  parts  of  the 
object.  The  second  part  consists  of  the  enumeration  of  the  object  given  the  first  part 
and  can  be  considered  a  description  of  the  random  aspects  of  the  object  [181]. 

Typical  Element  of  a  Set 

An  element  of  a  set  that  can  be  described  most  succinctly  by  an  index  from  an  enu¬ 
meration  of  all  elements  of  the  set.  If  first  describing  a  sub-set  and  then  enumerating 
the  element  can  more  succinctly  describe  an  element  of  a  set  then  it  is  not  a  typical 
element  [172]. 
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